HoneyryderChuck / httpx

(Mirror) An HTTP client library for ruby
https://gitlab.com/os85/httpx
Apache License 2.0
192 stars 10 forks source link

Issue with native DNS resolver #52

Closed pineman closed 4 months ago

pineman commented 5 months ago

Hi! Thank you so much for this gem! In particular, for implementing timeouts correctly without Timeout::timeout! I've switched to httpx just for that reason alone as the Timeout module was raising errors out of the blue in the rest of my code, but I digress...

I think I've hit an issue using the native resolver, which doesn't seem to happen on the system resolver. Here's the output of the debug log at level 2:

resolver: query A for login.microsoft.com
resolver: server: udp://1.1.1.1:53...
WRITE: 37 bytes...
READ: 37 bytes...
resolver: server: tcp://1.1.1.1:53...
 idle -> idle
Connected to 1.1.1.1 (#15)
resolver: query A for login.microsoft.com
WRITE: 39 bytes...
READ: 512 bytes...
READ: 144 bytes...
resolver: query A for a.privatelink.msidentity.CoM
1.1.1.1 connected -> closed
resolver: server: udp://1.1.1.1:53...
WRITE: 48 bytes...
resolver: timeout after -0.05416500003775582s, retry(3) login.microsoft.com...
resolver: query A for login.microsoft.com.
WRITE: 37 bytes...
READ: 37 bytes...
resolver: server: tcp://1.1.1.1:53...
 idle -> idle
Connected to 1.1.1.1 (#15)
resolver: query A for login.microsoft.com
WRITE: 39 bytes...
READ: 512 bytes...
READ: 144 bytes...
resolver: query A for a.privatelink.msidentity.CoM
1.1.1.1 connected -> closed
resolver: server: udp://1.1.1.1:53...
WRITE: 48 bytes...
resolver: timeout after -0.07551200001034886s, retry(3) login.microsoft.com...
resolver: query A for login.microsoft.com.
WRITE: 37 bytes...
READ: 37 bytes...
resolver: server: tcp://1.1.1.1:53...
 idle -> idle
Connected to 1.1.1.1 (#15)
resolver: query A for login.microsoft.com
WRITE: 39 bytes...
READ: 512 bytes...
READ: 144 bytes...
resolver: query A for a.privatelink.msidentity.CoM
1.1.1.1 connected -> closed
resolver: server: udp://1.1.1.1:53...
WRITE: 48 bytes...
resolver: timeout after -0.05033900000853464s, retry(3) login.microsoft.com...
resolver: query A for login.microsoft.com.
WRITE: 37 bytes...
READ: 37 bytes...
<...>

I've omitted the rest of my log file, as I believe it goes on and on in a loop. I had to CTRL-C my rails console as the request just hangs indefinitely, and this was the output at the end. Any clues on what might be happening? I hope this is enough info, let me know otherwise. httpx 1.2.5, ruby 3.3.0, macOS 14.5.

HoneyryderChuck commented 5 months ago

Hi @pineman ,

Thx for the issue. I think I know what's happening: DNS resolution is upgraded to tcp due to truncated packet, then gets an aliás, however the dns request there times out. Then, instead of sticking to the alias, it retries the original name, which already reset the Timeouts. So the loop goes on.

I'm on holidays, so I won't be able to pick this up in the next weeks. If this explanation is good enough for you to take a stab at it, go ahead.

pineman commented 5 months ago

Hey @HoneyryderChuck! Thanks for the explanation and quick reply (sorry to bother you on your holidays). I'll be honest, the odds of me having the bandwidth to pick this up are pretty low, especially since it doesn't happen with the system resolver!

HoneyryderChuck commented 5 months ago

No worries, at least you have a workaround for now. I'll see if I can reproduce this when I get back, and fix it.

HoneyryderChuck commented 5 months ago

@pineman can you try the gh-52 branch, collect the logs and post here, and see if it already fixes the loop?

So far, I've done the first 2 of the 3 things I think are wrong with the native resolver in this particular use-case:

My expectation is that the logs either prove that the 3rd point still happens, or that the timeout won't happen via UDP (which will make it hard to test a patch for the 3rd point).

pineman commented 5 months ago

Hm, I've been running my original code (same httpx version and all the rest) for quite a while now and I haven't been able to reproduce the issue (the upgrade from udp to tcp due to truncated packet like you said). So I'm not sure I could observe the changes you've done in the gh-52 branch... Sorry I'm not of much help here 😞. I'll keep it running for a while longer and if triggers the issue I'll then try the gh-52 branch.