curl / curl

A command line tool and library for transferring data with URL syntax, supporting DICT, FILE, FTP, FTPS, GOPHER, GOPHERS, HTTP, HTTPS, IMAP, IMAPS, LDAP, LDAPS, MQTT, POP3, POP3S, RTMP, RTMPS, RTSP, SCP, SFTP, SMB, SMBS, SMTP, SMTPS, TELNET, TFTP, WS and WSS. libcurl offers a myriad of powerful features
https://curl.se/
Other
35.38k stars 6.36k forks source link

Curl command freezes and does not work as expected #12532

Closed winstonma closed 9 months ago

winstonma commented 9 months ago

I did this

I would like to curl AMD website. I could wget it but I couldn't figure out how to get curl working.

# wget command works if the user-agent is added
$wget -U 'Mozilla/5.0' https://www.amd.com/en.html
Resolving www.amd.com (www.amd.com)... 104.89.160.108
Connecting to www.amd.com (www.amd.com)|104.89.160.108|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘en.html’

# First curl command gives error
$ curl https://www.amd.com/en.html
curl: (92) HTTP/2 stream 1 was not closed cleanly: INTERNAL_ERROR (err 2)

# Second curl command freeze forever
$ curl -A 'Mozilla/5.0' --http1.1 -v https://www.gooamd.com/en.html
*   Trying 23.46.196.82:443...
* Connected to www.amd.com (23.46.196.82) port 443
* ALPN: curl offers http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
*  CAfile: /home/winston/anaconda3/ssl/cacert.pem
*  CApath: none
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN: server accepted http/1.1
* Server certificate:
*  subject: C=US; ST=CALIFORNIA; L=Santa Clara; O=Advanced Micro Devices, Inc.; CN=amd.com
*  start date: Feb 21 00:00:00 2023 GMT
*  expire date: Feb 20 23:59:59 2024 GMT
*  subjectAltName: host "www.amd.com" matched cert's "www.amd.com"
*  issuer: C=US; O=DigiCert Inc; OU=www.digicert.com; CN=GeoTrust RSA CA 2018
*  SSL certificate verify ok.
* using HTTP/1.1
> GET /en.html HTTP/1.1
> Host: www.amd.com
> User-Agent: Mozilla/5.0
> Accept: */*
> 
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing

I expected the following

I expect that curl command comes with an error or curl with success

curl/libcurl version

curl 8.4.0 (x86_64-conda-linux-gnu) libcurl/8.4.0 OpenSSL/3.0.12 zlib/1.2.13 libssh2/1.10.0 nghttp2/1.57.0 Release-Date: 2023-10-11 Protocols: dict file ftp ftps gopher gophers http https imap imaps mqtt pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp Features: alt-svc AsynchDNS GSS-API HSTS HTTP2 HTTPS-proxy IPv6 Kerberos Largefile libz NTLM SPNEGO SSL threadsafe TLS-SRP UnixSockets

operating system

Linux notebook 6.6.6-zabbly+ #ubuntu22.04 SMP PREEMPT_DYNAMIC Mon Dec 11 17:02:30 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

icing commented 9 months ago

Tested this. The server sends a RST (Reset) after submitting the request. Probably some user-agent or ssl sniffing in place at the site.

bagder commented 9 months ago

curl 8.5.0 on debian seems to behave the same way

winstonma commented 9 months ago

@icing Just some additional information. I can use wget with custom user-agent (please check the first command in the first post) to grab the webpage. But I tried to add the same user-agent in curl command it seems it doesn't work. Also if I use wget without custom user-agent it would not work too.

Not sure if that help

bagder commented 9 months ago

It looks like the server does not send any data to curl.

Curious I tried wget 1.21.4 (also on debian), only to find it hangs for me the exact same way...

winstonma commented 9 months ago

It looks like the server does not send any data to curl.

Curious I tried wget 1.21.4 (also on debian), only to find it hangs for me the exact same way...

Could you try the following command? I could download using the following command with wget 1.21.2 is working on Ubuntu.

wget -U 'Mozilla/5.0' https://www.amd.com/en.html

I think the website need user agent to get it work. That's why I added the same user agent in the freezing curl command

Tested this. The server sends a RST (Reset) after submitting the request. Probably some user-agent or ssl sniffing in place at the site.

Just wonder if timeout mechanism should be added in curl or should curl exit when RST is received?

jay commented 9 months ago

I could download using the following command with wget 1.21.2 is working on Ubuntu.

I can reproduce in Ubuntu. The wget request works and the curl request doesn't. However I used wget -d to get the request headers and then sent those same request headers using curl and it worked.

---request begin---
GET /en.html HTTP/1.1
User-Agent: Mozilla/5.0
Accept: */*
Accept-Encoding: identity
Host: www.amd.com
Connection: Keep-Alive

---request end---
curl -v -A Mozilla/5.0 --http1.1 -H "Accept-Encoding: identity" -H "Connection: Keep-Alive" -O https://www.amd.com/en.html
> GET /en.html HTTP/1.1
> Host: www.amd.com
> User-Agent: Mozilla/5.0
> Accept: */*
> Accept-Encoding: identity
> Connection: Keep-Alive

If I take away either Accept-Encoding or Connection the server will not reply. Possibly what is happening is this is a CDN server that can return a cached version of a page that is tied to a particular combination of headers. However, if it does not recognize the combination of headers then it will connect to the origin server with those headers to retrieve the page, add to its cache and return the response to the client. Sometimes it is tied to a user agent and sometimes it isn't.

The server isn't responding probably because the origin server isn't responding. It just happens to work for some combination of headers because the page is already cached for that combination.

curl is behaving as intended. If the request is successfully sent it will wait indefinitely for a response unless you set a timeout with --max-time.

winstonma commented 9 months ago

@jay Thanks for the answer. Not only I learn the debugging process but also learn how the CDN response.