algesten / ureq

A simple, safe HTTP client
Apache License 2.0
1.69k stars 175 forks source link

"Error while decoding chunks" on some websites #325

Open Shnatsel opened 3 years ago

Shnatsel commented 3 years ago

On some websites, e.g. http://banxetoyota.vn, ureq fails with the following error:

Error while decoding chunks

However, curl and Firefox work fine.

There's 38 such websites in the top million (I'm using Tranco list generated on the 3rd of February).

Archive with all occurrences: ureq-error-decoding-chunks.tar.gz

Code used for testing: https://github.com/Shnatsel/rust-http-clients-smoke-test/blob/f206362f2e81521bbefb84007cdd25242f6db590/ureq-smoke-test/src/main.rs

algesten commented 2 years ago

Of the original 38 on this list, the majority now work. These still show the behavior.

http://adorama.com -- see https://github.com/algesten/ureq/pull/454 http://extradom.pl -- curl: (18) transfer closed with outstanding read data remaining http://leisurepro.com -- see https://github.com/algesten/ureq/pull/454 http://nemaweb.org curl: (18) transfer closed with outstanding read data remaining http://sfpnet.fr curl: (18) transfer closed with outstanding read data remaining http://sunnysports.com -- see https://github.com/algesten/ureq/pull/454

(I use curl --http1.1 -L --trace ./trace.txt http://sunnysports.com)

I've identified one scenario we could handle better (broken server). See https://github.com/algesten/ureq/pull/454

jsha commented 1 year ago

I haven't checked all of these, but I just checked https://www.adorama.com/, and it's not a case of missing the \r\n at the end of the message, it's a case of trailers, which we don't yet handle correctly:

(echo -e 'GET / HTTP/1.1\r\nHost: www.adorama.com\r\n' ;sleep 10) |openssl s_client -connect www.adorama.com:443   | xxd
...
000025e0: 203c 2f62 6f64 793e 3c2f 6874 6d6c 3e0d   </body></html>.
000025f0: 0a30 0d0a 7365 7276 6572 2d74 696d 696e  .0..server-timin
00002600: 673a 2072 7474 3b20 6475 723d 3233 2e34  g: rtt; dur=23.4
00002610: 3731 2c20 7265 7472 616e 733b 2064 7572  71, retrans; dur
00002620: 3d30 2c20 7472 6169 6c65 722d 7469 6d65  =0, trailer-time
00002630: 7374 616d 703b 2064 7572 3d31 3637 3036  stamp; dur=16706
00002640: 3330 3331 3231 3530 0d0a 0d0a 636c 6f73  30312150....clos
00002650: 6564 0a    

I think rather than land #454 we should implement trailers and see if that fixes all of these cases.

algesten commented 1 year ago

Alright

algesten commented 1 year ago

Of the above examples, I think only http://sfpnet.fr/ shows the behavior it did when this Shnatsel ran this analysis. I think this one have garbage at the end.

I'm quite certain we did not see trailing headers in the above before.

0001e8d0: 3c2f 6874 6d6c 3e0a 0d0a 7265 6164 2052  </html>...read R
0001e8e0: 2042 4c4f 434b 0a72 6561 6420 5220 424c   BLOCK.read R BL
0001e8f0: 4f43 4b0a 636c 6f73 6564 0a              OCK.closed.
algesten commented 2 months ago

This probably needs fixing in 3.x now.