Closed Arvamer closed 1 year ago
Hi @Arvamer welcome to ureq!
Thanks for the very detailed report! There were some changes in when we will return a connection to the agent pool. This could be involved.
Sounds like it should be quite straightforward to verify the HTTP/1.0 theory with a unit test. I'll have a look soon!
Indeed, thanks for the great report!
When a response has no Content-Length or Transfer-Encoding, that means it is "close-delimited" - in other words, the response is done when the connection closes. If the server is holding the connection open for 5 seconds after sending a close-delimited response, I would expect ureq to keep trying to read that connection for 5 seconds.
In the example, the server is sending connection: keep-alive
and keep-alive: timeout=5, max=512
, which suggests it does intend to hold the connection open for 5 seconds after sending the response. But since the response is close-delimited there's no reason to hold it open.
So the question is - why would this have finished promptly in ureq 2.5.0? Here's the diff between the two.
Since the response is JSON, in some sense it doesn't matter whether the server closes the connection promptly. If the client is willing to say "I saw the JSON close }
, so I consider the response done and am closing the connection myself," everything is hunky-dory. In fact, I think this is what ureq does if you call into_json()
.
That approach is arguably incorrect. If the server sent some extra junk after the JSON finished, the client should get an error rather than ignoring the extra junk. There's a similar issue we happened to fix in 2.6.0: if the server sends extra junk after a gzip or Brotli response, we will read it and return an error (previously we ignored it and dropped the connection). I think we didn't fix it for JSON, and your example doesn't use into_json()
anyhow.
So, I'm not sure yet exactly what the problem is, but I think the problem is in that general area. If you have time, it would be helpful to add env_logger
to your demo program and run with RUST_LOG=trace
. Alternately, you can use examples/cureq.rs, which already has env_logger:
RUST_LOG=trace cargo run --example cureq --features="charset cookies socks-proxy native-tls" "http://192.168.59.2:8049/route/v1/driving/21.0122,52.2297;21.1122,52.2097?geometries=polyline&overview=full&continue_straight=true"
[2023-03-16T18:33:37Z DEBUG ureq::stream] connecting to 192.168.59.2:8049 at 192.168.59.2:8049
[2023-03-16T18:33:37Z DEBUG ureq::stream] created stream: Stream(TcpStream { addr: 192.168.59.220:40338, peer: 192.168.59.2:8049, fd: 3 })
[2023-03-16T18:33:37Z DEBUG ureq::unit] sending request GET http://192.168.59.2:8049/route/v1/driving/21.0122,52.2297;21.1122,52.2097?geometries=polyline&overview=full&continue_straight=true
[2023-03-16T18:33:37Z DEBUG ureq::unit] writing prelude: GET /route/v1/driving/21.0122,52.2297;21.1122,52.2097?geometries=polyline&overview=full&continue_straight=true HTTP/1.1
Host: 192.168.59.2:8049
User-Agent: ureq/2.6.2
Accept: */*
accept-encoding: gzip
[2023-03-16T18:33:37Z DEBUG ureq::response] Body of unknown size - read until socket close
[2023-03-16T18:33:37Z DEBUG ureq::unit] response 200 to GET http://192.168.59.2:8049/route/v1/driving/21.0122,52.2297;21.1122,52.2097?geometries=polyline&overview=full&continue_straight=true
{"code":"Ok","routes":[{"geometry":"ubx}Hu}f_CFCCIBOF?D?N?B?LCRKXKLERKRI@ABAAGGg@@SMaAMaAEUGWAKQ{A{AwLIo@Ge@AIKs@Gg@Ge@e@oDCYUcBHOCQ?E@CDMACEK`AqBTe@^y@N_@JUEa@C]]{BCOOu@CQAWAMAyCGeEA}AAeAAaC?cA?g@FmE?o@?k@?g@A_@Ca@KuAMsAJGEa@IqAI_A@[?G?A?C?CB]?G?K?I@O?G?Q?OGINuDNm@DUZkA@KJ_@AIh@oBFSf@iB\\}@v@eBz@qBLY@C?ABGf@cAnAsCfBwDJCDEFGCK~@yBZs@LUTg@Ve@h@qABGTi@HQj@qADH@AFCFADAK{ABWv@Q\\IAwA|@a@F?BGHM@OEG\\_AlA_DEa@NIEWW}BG_@E]CWIs@YyBOoBi@eEmD{YeAyIMm@CKCSSeBBCDGb@k@BGl@w@HWr@eAjAeBh@u@f@}@^w@`@}@JSDMP[?M@MD[N]`@aARe@Nc@Na@J]FM^mA^mARs@BM\\qAXoAZ}Ad@eCf@qCRsAPoAPmANsAHu@LkAFg@@MHcAHcAFs@Du@LcCHkBDq@BmADuABiABkA@kA@iA?qE@w@?u@@qD?u@@{B@gABgM@aE@gB?w@?q@@c@?c@@a@?a@@c@?a@@c@@a@@e@@a@@Q?a@Ba@@c@@a@@a@@QBi@@[?K@OBg@Ba@@c@@G@WBe@Bg@Dg@Bm@Di@F{@Fu@Fw@P}BRaCFo@Hw@Fs@JcAFk@Fk@Ho@TmB`@eDDYNiA@MBKBUBSDYD[zByPNeAFi@Hi@Jq@Hc@Fa@Jq@Hc@F[Jc@Jc@Nq@H[Lc@H]Nc@HYHWJ]Ts@N]pAiDhBmE@K@O@Q?MAKRa@x@cB`BuCXg@j@qAZmArDoELMDEr@s@JKJIPU\\g@t@cAf@o@j@iAPYPSDCZ[BCV[Zc@d@o@Xa@Za@X[BERQ@MDEDIFGFIFKNSLSR]\\_@jA}A`@c@BCBCM{@M_AIk@_@qCQyAUyAIq@bEqBr@_@f@Wh@Y~Aw@lBcAiAiI","legs":[{"steps":[],"distance":8168.2,"duration":5900.6,"summary":"","weight":5900.6}],"distance":8168.2,"duration":5900.6,"weight_name":"duration","weight":5900.6}],"waypoints":[{"hint":"4vIGgJYhD4FnAAAAIgAAAP0AAAAAAAAA2eVmQSf1k0CZnAxCAAAAAGcAAAAiAAAA_QAAAAAAAAABAAAAMp9AAUv2HAPonkABRPYcAwIArwoSGgIR","distance":5.116463,"name":"","location":[21.012274,52.229707]
[2023-03-16T18:33:42Z DEBUG ureq::stream] dropping stream: Stream(TcpStream { addr: 192.168.59.220:40338, peer: 192.168.59.2:8049, fd: 3 })
},{"hint":"utwQgc_cEIFfAwAAGwMAAAAAAABvAgAAR4HwQvYj3UIAAAAAnGStQl8DAAAbAwAAAAAAAG8CAAABAAAA1CZCAf6lHAOIJUIBJKgcAwAAXwQSGgIR","distance":65.269656,"name":"","location":[21.112532,52.20915]}]}
The 5s hang happens after "dropping stream" message. After hang the final part of response is printed, but looking at strace output it's probably some buffering because there is only a single write
to stdout with response body.
Since the response is JSON, in some sense it doesn't matter whether the server closes the connection promptly. If the client is willing to say "I saw the JSON close }, so I consider the response done and am closing the connection myself," everything is hunky-dory. In fact, I think this is what ureq does if you call into_json().
Actually original code was using ureq's into_json()
, and there was still 5s hang.
Thanks for the additional logs!
The 5s hang happens after "dropping stream" message.
Are you sure? The timestamps suggest the hang happens before the "dropping stream" message:
[2023-03-16T18:33:37Z DEBUG ureq::unit] response 200 to GET ... [2023-03-16T18:33:42Z DEBUG ureq::stream] dropping stream: ...
That would be compatible with the idea that ureq is waiting for the server to close the TCP connection, and the server is closing it after 5s.
Are you sure? The timestamps suggest the hang happens before the "dropping stream" message:
Tested again and you are right, it's printed after.
Noticed that content-length
appears in response headers when using curl and reqwest, and looking more into strace output OSRM actually respond with it.
recvfrom(3, "HTTP/1.0 200 OK\r\nContent-Encoding: gzip\r\nAccess-Control-Allow-Origin: *\r\nAccess-Control-Allow-Methods: GET\r\nAccess-Control-Allow-Headers: X-Requested-With, Content-Type\r\nContent-Type: application/json; charset=UTF-8\r\nContent-Disposition: inline; filename=\"response.json\"\r\nContent-Length: 1075\r\nConnection: keep-alive\r\nKeep-Alive: timeout=5, max=512\r\n\r\n\37\213\10\0\0\0\0\0\4\377\215T\333r\332H\20\375\27\236]. \206\330y\321\351\231\321\35\1\2\23\314E\301\262,\204\214\205\fBV\10K\276}\33\234\354>\354n\325Nu\365\364\234\231i\235\351n\365\261\26\345\317q\355K\255\267\256]\325vy\271\217\213\332\227\331\261\226\304y\26\357w\7\336*\237\276\237\254\362\264\\HCJ[\364\fMi]Mh\359p\37\334\216>p\0076H\20\231f\2\f\275\220X\364\2219&\327?\322\221\252\216\235\303\214A\266[\200\217\260\31#Wr2\212\204\325\223\276\246C*\217\244\356>\322V\334\307\370v@w\1g\244\207\220Ap\24\262\327+!}\32\223G\7i\306:\235\210b\242Pj\21i\t\214L\327rhk\260M\v\310\20nI^A\216\311\36\354-\331\v\302L35\322\244&E\300\226\253\331\350\361\354k=\323\356\226\252\233A\215\246k\202\353,\230\347\n\2710\206K\244b>?\341\35\261\370\201\255\350L 5\22\346\22\21m\250\220KQ)G*\3350\245\373\23\0071-\320\31\335'\370\32c\205-\37\274Oa\371/l*\vdH\203\24\271G\22\343w\370\363\271M\25\375\201\20\206&L\313CO7\347\363\5\275\322B1\351\256\255\217\307'a.\240\7rl\27\230\34D/\27)b=S\307IL\7\333\313 ]9\34\306BHe>a\ra\276\242\2025\336!\246\27\212\305\n%\2268\341[\205G\236\234\241\362\3723\315\203\247f\335\340\21!\rbt#tC8\201\341}\313\210eP@x\363\371\226\36r\232\236\350\31\261\\b+\7\5\365s\352g\324-\310*\321Y\223\301\251\366\254\210X\214\2\212\261HZk\241\266\20\31\251\222DJ\202\343\311\222\222\266\325\231\230V\2[u\326G\201\204D\342!\324\221\10\215\267\266@\4-\2\307C\v/\366E\363\222K\345\254}\206\305_\330\31\340`pR]\364D\362{\307\304X\304\340\265b\2103\232\3028\302(aT\350\237\304 \224F\16\253\2\363u\230\365\32,V\216\373L<\"Vj\322M\t\236p\305H\f\325D\315~\210C\277\33\223\301IL\341laE08T\37\306\314\211\330\t\272\274\232u\"X\1\7\322\232Xc'\270/\320\r\336(U+\221\351`~\314\335#w\20\342;\"\361(J\371\220\340\\\24\323\214v*\327;\236\322w`J\256c\367G\363y\202=\327\327\229\37J\251?\351\17\225\234\316\204\374:\233Fxf\370!\3044\304\303L\350\3\237s\251+\3330\r\333p\273\303\316p\20p\21\341\205\377\217G\16\250\220BzGx\v\262\327Xp\32\375\3\215\16do\361\244o\305\216\241%\306+L~R\205W\21QJ\251\315]\3405N>z@\261\217\337\316VpU{N\213}\270\211\270S\3346\332\267\327MF\312]\270O\363M\355K\353\256^\277n_\325\2122\313\302K\307`'U\234&\253\375\357\315\323\377t\361qk\261\t\263sO\372\353\23\377\346\256\n\17oy\272\331\177P]\261\3057n\336m3q&+ucl\210\207\235\260\352\327\317\346e4\343\257\231?\\6\326u9\335\320wy\1\315\350<\245g\265\360\317\3722\270\237\21yowD\243\367\246\305\345\277Y\223\30\364'\21U6\355\252|h&\366\200\37\372wdZ\327\215F\373\246\375\351\252\366\213\3779\226y\364+J\263f\343\272\336h6?\337\\\265\232\327\315\346\335\347\372\347\340", 8192, 0, NULL, NULL) = 1313
Without gzip:
recvfrom(3, "HTTP/1.0 200 OK\r\nAccess-Control-Allow-Origin: *\r\nAccess-Control-Allow-Methods: GET\r\nAccess-Control-Allow-Headers: X-Requested-With, Content-Type\r\nContent-Type: application/json; charset=UTF-8\r\nContent-Disposition: inline; filename=\"response.json\"\r\nContent-Length: 1597\r\nConnection: keep-alive\r\nKeep-Alive: timeout=5, max=512\r\n\r\n{\"code\":\"Ok\",\"routes\":[{\"geometry\":\"ubx}Hu}f_CFCCIBOF?D?N?B?LCRKXKLERKRI@ABAAGGg@@SMaAMaAEUGWAKQ{A{AwLIo@Ge@AIKs@Gg@Ge@e@oDCYUcBHOCQ?E@CDMACEK`AqBTe@^y@N_@JUEa@C]]{BCOOu@CQAWAMAyCGeEA}AAeAAaC?cA?g@FmE?o@?k@?g@A_@Ca@KuAMsAJGEa@IqAI_A@[?G?A?C?CB]?G?K?I@O?G?Q?OGINuDNm@DUZkA@KJ_@AIh@oBFSf@iB\\\\}@v@eBz@qBLY@C?ABGf@cAnAsCfBwDJCDEFGCK~@yBZs@LUTg@Ve@h@qABGTi@HQj@qADH@AFCFADAK{ABWv@Q\\\\IAwA|@a@F?BGHM@OEG\\\\_AlA_DEa@NIEWW}BG_@E]CWIs@YyBOoBi@eEmD{YeAyIMm@CKCSSeBBCDGb@k@BGl@w@HWr@eAjAeBh@u@f@}@^w@`@}@JSDMP[?M@MD[N]`@aARe@Nc@Na@J]FM^mA^mARs@BM\\\\qAXoAZ}Ad@eCf@qCRsAPoAPmANsAHu@LkAFg@@MHcAHcAFs@Du@LcCHkBDq@BmADuABiABkA@kA@iA?qE@w@?u@@qD?u@@{B@gABgM@aE@gB?w@?q@@c@?c@@a@?a@@c@?a@@c@@a@@e@@a@@Q?a@Ba@@c@@a@@a@@QBi@@[?K@OBg@Ba@@c@@G@WBe@Bg@Dg@Bm@Di@F{@Fu@Fw@P}BRaCFo@Hw@Fs@JcAFk@Fk@Ho@TmB`@eDDYNiA@MBKBUBSDYD[zByPNeAFi@Hi@Jq@Hc@Fa@Jq@Hc@F[Jc@Jc@Nq@H[Lc@H]Nc@HYHWJ]Ts@N]pAiDhBmE@K@O@Q?MAKRa@x@cB`BuCXg@j@qAZmArDoELMDEr@s@JKJIPU\\\\g@t@cAf@o@j@iAPYPSDCZ[BCV[Zc@d@o@Xa@Za@X[BERQ@MDEDIFGFIFKNSLSR]\\\\_@jA}A`@c@BCBCM{@M_AIk@_@qCQyAUyAIq@bEqBr@_@f@Wh@Y~Aw@lBcAiAiI\",\"legs\":[{\"steps\":[],\"distance\":8168.2,\"duration\":5900.6,\"summary\":\"\",\"weight\":5900.6}],\"distance\":8168.2,\"duration\":5900.6,\"weight_name\":\"duration\",\"weight\":5900.6}],\"waypoints\":[{\"hint\":\"4vIGgJYhD4FnAAAAIgAAAP0AAAAAAAAA2eVmQSf1k0CZnAxCAAAAAGcAAAAiAAAA_QAAAAAAAAABAAAAMp9AAUv2HAPonkABRPYcAwIArwoSGgIR\",\"distance\":5.116463,\"name\":\"\",\"location\":[21.012274,52.229707]},{\"hint\":\"utwQgc_cEIFfAwAAGwMAAAAAAABvAgAAR4HwQvYj3UIAAAAAnGStQl8DAAAbAwAAAAAAAG8CAAABAAAA1CZCAf6lHAOIJUIBJKgcAwAAXwQSGgIR\",\"distance\":65.269656,\"name\":\"\",\"location\":[21.112532,52.20915]}]}", 8192, 0, NULL, NULL) = 1925
Or a bit better formatted content of received response from above strace (for part in x.split("\r\n") { println!("{} {part}", part.len()) }
):
15 HTTP/1.0 200 OK
30 Access-Control-Allow-Origin: *
33 Access-Control-Allow-Methods: GET
60 Access-Control-Allow-Headers: X-Requested-With, Content-Type
45 Content-Type: application/json; charset=UTF-8
53 Content-Disposition: inline; filename="response.json"
20 Content-Length: 1597
22 Connection: keep-alive
30 Keep-Alive: timeout=5, max=512
0
1597 {"code":"Ok","routes":[{"geometry":"ubx}Hu}f_CFCCIBOF?D?N?B?LCRKXKLERKRI@ABAAGGg@@SMaAMaAEUGWAKQ{A{AwLIo@Ge@AIKs@Gg@Ge@e@oDCYUcBHOCQ?E@CDMACEK`AqBTe@^y@N_@JUEa@C]]{BCOOu@CQAWAMAyCGeEA}AAeAAaC?cA?g@FmE?o@?k@?g@A_@Ca@KuAMsAJGEa@IqAI_A@[?G?A?C?CB]?G?K?I@O?G?Q?OGINuDNm@DUZkA@KJ_@AIh@oBFSf@iB\\}@v@eBz@qBLY@C?ABGf@cAnAsCfBwDJCDEFGCK~@yBZs@LUTg@Ve@h@qABGTi@HQj@qADH@AFCFADAK{ABWv@Q\\IAwA|@a@F?BGHM@OEG\\_AlA_DEa@NIEWW}BG_@E]CWIs@YyBOoBi@eEmD{YeAyIMm@CKCSSeBBCDGb@k@BGl@w@HWr@eAjAeBh@u@f@}@^w@`@}@JSDMP[?M@MD[N]`@aARe@Nc@Na@J]FM^mA^mARs@BM\\qAXoAZ}Ad@eCf@qCRsAPoAPmANsAHu@LkAFg@@MHcAHcAFs@Du@LcCHkBDq@BmADuABiABkA@kA@iA?qE@w@?u@@qD?u@@{B@gABgM@aE@gB?w@?q@@c@?c@@a@?a@@c@?a@@c@@a@@e@@a@@Q?a@Ba@@c@@a@@a@@QBi@@[?K@OBg@Ba@@c@@G@WBe@Bg@Dg@Bm@Di@F{@Fu@Fw@P}BRaCFo@Hw@Fs@JcAFk@Fk@Ho@TmB`@eDDYNiA@MBKBUBSDYD[zByPNeAFi@Hi@Jq@Hc@Fa@Jq@Hc@F[Jc@Jc@Nq@H[Lc@H]Nc@HYHWJ]Ts@N]pAiDhBmE@K@O@Q?MAKRa@x@cB`BuCXg@j@qAZmArDoELMDEr@s@JKJIPU\\g@t@cAf@o@j@iAPYPSDCZ[BCV[Zc@d@o@Xa@Za@X[BERQ@MDEDIFGFIFKNSLSR]\\_@jA}A`@c@BCBCM{@M_AIk@_@qCQyAUyAIq@bEqBr@_@f@Wh@Y~Aw@lBcAiAiI","legs":[{"steps":[],"distance":8168.2,"duration":5900.6,"summary":"","weight":5900.6}],"distance":8168.2,"duration":5900.6,"weight_name":"duration","weight":5900.6}],"waypoints":[{"hint":"4vIGgJYhD4FnAAAAIgAAAP0AAAAAAAAA2eVmQSf1k0CZnAxCAAAAAGcAAAAiAAAA_QAAAAAAAAABAAAAMp9AAUv2HAPonkABRPYcAwIArwoSGgIR","distance":5.116463,"name":"","location":[21.012274,52.229707]},{"hint":"utwQgc_cEIFfAwAAGwMAAAAAAABvAgAAR4HwQvYj3UIAAAAAnGStQl8DAAAbAwAAAAAAAG8CAAABAAAA1CZCAf6lHAOIJUIBJKgcAwAAXwQSGgIR","distance":65.269656,"name":"","location":[21.112532,52.20915]}]}
@Arvamer sounds like there is some request header that triggers the behavior.
Sometimes services treat User-Agent
differently. Maybe it doesn't recognize ureq and therefore downgrades to HTTP/1.0? It would be worth comparing request headers between ureq and curl.
Curl headers:
> Host: 192.168.59.2:8049
> User-Agent: curl/7.88.1
> Accept: */*
>
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Methods: GET
< Access-Control-Allow-Headers: X-Requested-With, Content-Type
< Content-Type: application/json; charset=UTF-8
< Content-Disposition: inline; filename="response.json"
< Content-Length: 1597
* HTTP/1.0 connection set to keep alive
< Connection: keep-alive
< Keep-Alive: timeout=5, max=512
<
So looks same for me. Is it expected that content-length is not returned by response.headers_names()
for http 1.0, even when server sent it?
Oh. That could be it. Maybe we ignore the content-length if it is HTTP/1.0?
Is it expected that content-length is not returned by response.headers_names() for http 1.0, even when server sent it?
Yes, if the response was gzip-encoded and ureq automatically decompressed it (we do this to avoid confusion about the length of the decoded response).
Maybe we ignore the content-length if it is HTTP/1.0?
Bingo! I think this may be the source of the problem:
For an http/1.0 response, ureq considers the response body to be close-delimited, even if there was a content-length header. That's wrong. In fact, I have an old stale PR fixing it:
Remove consideration of HTTP version. An HTTP/1.0 response can have a Content-Length.
Thanks so much for providing a good real-world example motivating the fix. We'll get that cleaned up and submitted.
Example code:
Response:
Strace:
Second
recvfrom
returns 0 bytes received after 5s.In 2.5, first
recvfrom
read less data, so second call torecvfrom
also returns something:OSRM from this docker image: https://hub.docker.com/r/osrm/osrm-backend/
My guess is on incorrect handling of missing content-length header/HTTP 1.0 .