Closed 636f7374 closed 4 years ago
@636f7374 What was the problem? Do you care to share?
@straight-shoota Oops, the problem still exists. ... Sorry, I replied a bit late.
@straight-shoota This problem is tricky, it hinders my reverse proxy server. I am looking for this problem, it may take some time. If there is new progress, I will send it here. π
iMac: User$ nslookup facebook.com
Server: 192.168.1.1
Address: 192.168.1.1#53
Non-authoritative answer:
Name: facebook.com
Address: 67.228.74.123
iMac: User$ dig facebook.com
; <<>> DiG 9.8.3-P1 <<>> facebook.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 16053
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;facebook.com. IN A
;; ANSWER SECTION:
facebook.com. 79 IN A 69.171.240.27
;; Query time: 1243 msec
;; SERVER: 192.168.1.1#53(192.168.1.1)
;; WHEN: Sat Oct 26 22:55:59 2019
;; MSG SIZE rcvd: 46
@636f7374 might you be reaching the http://danielmendel.github.io/blog/2013/04/07/benchmarkers-beware-the-ephemeral-port-limit/ ?
@bcardiff Okay, Yes, I tried ulimit -u 1024
before (macOS default: 256).
lsof
results have been changing.
In addition, this problem will also occur when the number of requests is small. π (I.e. Run the right
command once, the server will (freeze | blocking | hangs | waiting), no matter how many requests.)
iMac:~ User$ lsof -p 11290
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
threads 11290 User cwd DIR 1,3 170 10262722 /Users/User/Downloads/threads
threads 11290 User txt REG 1,3 3090564 10292039 /Users/User/Downloads/threads/threads
threads 11290 User txt REG 1,3 381200 6765855 /usr/local/Cellar/openssl/1.0.2s/lib/libssl.1.0.0.dylib
threads 11290 User txt REG 1,3 2003932 6765852 /usr/local/Cellar/openssl/1.0.2s/lib/libcrypto.1.0.0.dylib
threads 11290 User txt REG 1,3 416976 422220 /usr/lib/libpcre.0.dylib
threads 11290 User txt REG 1,3 14352 9630926 /usr/local/Cellar/libevent/2.1.11_1/lib/libevent_pthreads-2.1.7.dylib
threads 11290 User txt REG 1,3 248404 9630914 /usr/local/Cellar/libevent/2.1.11_1/lib/libevent-2.1.7.dylib
threads 11290 User txt REG 1,3 698896 422030 /usr/lib/dyld
threads 11290 User txt REG 1,3 662274048 9591240 /private/var/db/dyld/dyld_shared_cache_x86_64h
threads 11290 User 0u CHR 16,0 0t7748 1117 /dev/ttys000
threads 11290 User 1u CHR 16,0 0t7748 1117 /dev/ttys000
threads 11290 User 2u CHR 16,0 0t7748 1117 /dev/ttys000
threads 11290 User 3 PIPE 0xa1e6fa57b499b07d 16384 ->0xa1e6fa57b499c33d
threads 11290 User 4 PIPE 0xa1e6fa57b499c33d 16384 ->0xa1e6fa57b499b07d
threads 11290 User 5r CHR 14,1 0t32 574 /dev/urandom
threads 11290 User 6u CHR 16,0 0t4002 1117 /dev/ttys000
threads 11290 User 7u CHR 16,0 0t0 1117 /dev/ttys000
threads 11290 User 8 PIPE 0xa1e6fa57b499c3fd 16384 ->0xa1e6fa57b499c4bd
threads 11290 User 9 PIPE 0xa1e6fa57b499c4bd 16384 ->0xa1e6fa57b499c3fd
threads 11290 User 10 PIPE 0xa1e6fa57b499cdbd 16384 ->0xa1e6fa57b499a53d
threads 11290 User 11 PIPE 0xa1e6fa57b499a53d 16384 ->0xa1e6fa57b499cdbd
threads 11290 User 12u KQUEUE count=2, state=0x8
threads 11290 User 13 PIPE 0xa1e6fa57b499a6bd 16384 ->0xa1e6fa57b499ccfd
threads 11290 User 14 PIPE 0xa1e6fa57b499ccfd 16384 ->0xa1e6fa57b499a6bd
threads 11290 User 15u KQUEUE count=1, state=0x8
threads 11290 User 16u KQUEUE count=0, state=0xa
threads 11290 User 17 PIPE 0xa1e6fa57b499cf3d 16384 ->0xa1e6fa57b499b1fd
threads 11290 User 18 PIPE 0xa1e6fa57b499b1fd 16384 ->0xa1e6fa57b499cf3d
threads 11290 User 19u KQUEUE count=1, state=0x8
threads 11290 User 20u IPv4 0xa1e6fa57c32d7cf5 0t0 TCP localhost:8777 (LISTEN)
threads 11290 User 21u IPv4 0xa1e6fa57c47dbee5 0t0 TCP 192.168.1.100:55086->31.13.75.18:http (SYN_SENT)
threads 11290 User 22 NPOLICY
threads 11290 User 23u unix 0xa1e6fa57bdc029a5 0t0 ->0xa1e6fa57bdc01eb5
threads 11290 User 24u systm 0t0
threads 11290 User 25u IPv4 0xa1e6fa57c481cb05 0t0 TCP localhost:8777->localhost:55047 (CLOSE_WAIT)
threads 11290 User 26u IPv4 0xa1e6fa57bd5ff20d 0t0 TCP localhost:8777->localhost:55030 (CLOSE_WAIT)
threads 11290 User 28u IPv4 0xa1e6fa57c481d3fd 0t0 TCP localhost:8777->localhost:55053 (CLOSE_WAIT)
threads 11290 User 29u IPv4 0xa1e6fa57c31bbb05 0t0 TCP 192.168.1.100:55126->31.13.75.18:http (SYN_SENT)
threads 11290 User 30u IPv4 0xa1e6fa57c476db05 0t0 TCP localhost:8777->localhost:55063 (CLOSE_WAIT)
threads 11290 User 31u IPv4 0xa1e6fa57bc7b7915 0t0 TCP localhost:8777->localhost:55069 (CLOSE_WAIT)
threads 11290 User 32u IPv4 0xa1e6fa57c485c5ed 0t0 TCP localhost:8777->localhost:55075 (CLOSE_WAIT)
threads 11290 User 33u IPv4 0xa1e6fa57c47ca915 0t0 TCP localhost:8777->localhost:55081 (CLOSE_WAIT)
threads 11290 User 34u IPv4 0xa1e6fa57c31ddcf5 0t0 TCP 192.168.1.100:55127->31.13.75.18:http (SYN_SENT)
threads 11290 User 35u IPv4 0xa1e6fa57c48905ed 0t0 TCP 192.168.1.100:55128->31.13.75.18:http (SYN_SENT)
threads 11290 User 37u IPv4 0xa1e6fa57c47d920d 0t0 TCP localhost:8777->localhost:55094 (CLOSE_WAIT)
threads 11290 User 38u IPv4 0xa1e6fa57c47da3fd 0t0 TCP localhost:8777->localhost:55100 (CLOSE_WAIT)
threads 11290 User 39u IPv4 0xa1e6fa57c31895ed 0t0 TCP localhost:8777->localhost:55106 (CLOSE_WAIT)
threads 11290 User 40u IPv4 0xa1e6fa57c48b8915 0t0 TCP localhost:8777->localhost:55115 (CLOSE_WAIT)
threads 11290 User 41u IPv4 0xa1e6fa57c31b95ed 0t0 TCP localhost:8777->localhost:55121 (CLOSE_WAIT)
threads 11290 User 42u KQUEUE count=0, state=0xa
threads 11290 User 44u KQUEUE count=0, state=0xa
threads 11290 User 45u KQUEUE count=0, state=0xa
I took a closer look.
Seem Appears when [#
Sometimes this problem will occur many times ([#
Maybe I should record a screen video, then convert it to GIF, send it here.
But it seems that "[#
I should find some way to let him constantly trigger the "[#
This problem will always appear with #<IO::Timeout:connect timed out>
, but I am using the netcat nc -l
test and can't trigger this problem.
I, [2019-10-26 22:13:54 +08:00 #5335] INFO -- : [0, 1, "facebook.com", 2019-10-26 14:13:54.516772000 UTC, "Connected"]
I, [2019-10-26 22:13:59 +08:00 #5335] INFO -- : [1, 1, "taobao.com", 2019-10-26 14:13:59.634948000 UTC, "Connected"]
I, [2019-10-26 22:13:59 +08:00 #5335] INFO -- : [1, 2, "facebook.com", 2019-10-26 14:13:59.669194000 UTC, "Connected"]
I, [2019-10-26 22:14:04 +08:00 #5335] INFO -- : [1, 3, "facebook.com", 2019-10-26 14:14:04.676075000 UTC, "Connected"]
I, [2019-10-26 22:14:09 +08:00 #5335] INFO -- : [1, 4, "facebook.com", 2019-10-26 14:14:09.682888000 UTC, "Connected"]
I, [2019-10-26 22:14:14 +08:00 #5335] INFO -- : [1, 5, "facebook.com", 2019-10-26 14:14:14.691049000 UTC, "Connected"]
I, [2019-10-26 22:14:19 +08:00 #5335] INFO -- : [1, 6, "facebook.com", 2019-10-26 14:14:19.697024000 UTC, "Connected"]
I, [2019-10-26 22:14:24 +08:00 #5335] INFO -- : [#<IO::Timeout:connect timed out>]
I, [2019-10-26 22:14:24 +08:00 #5335] INFO -- : [#<IO::Timeout:connect timed out>]
I, [2019-10-26 22:14:24 +08:00 #5335] INFO -- : [#<IO::Timeout:connect timed out>]
I, [2019-10-26 22:14:24 +08:00 #5335] INFO -- : [#<IO::Timeout:connect timed out>]
I, [2019-10-26 22:14:24 +08:00 #5335] INFO -- : [1, 7, "facebook.com", 2019-10-26 14:14:24.702985000 UTC, "Connected"]
I, [2019-10-26 22:14:29 +08:00 #5335] INFO -- : [2, 7, "taobao.com", 2019-10-26 14:14:29.706984000 UTC, "Connected"]
I, [2019-10-26 22:14:29 +08:00 #5335] INFO -- : [2, 8, "facebook.com", 2019-10-26 14:14:29.708446000 UTC, "Connected"]
I, [2019-10-26 22:14:34 +08:00 #5335] INFO -- : [2, 9, "facebook.com", 2019-10-26 14:14:34.716301000 UTC, "Connected"]
I, [2019-10-26 22:14:39 +08:00 #5335] INFO -- : [#<IO::Timeout:connect timed out>]
I, [2019-10-26 22:14:39 +08:00 #5335] INFO -- : [#<IO::Timeout:connect timed out>]
I, [2019-10-26 22:14:39 +08:00 #5335] INFO -- : [#<IO::Timeout:connect timed out>]
I, [2019-10-26 22:14:39 +08:00 #5335] INFO -- : [#<IO::Timeout:connect timed out>]
I, [2019-10-26 22:14:39 +08:00 #5335] INFO -- : [3, 9, "taobao.com", 2019-10-26 14:14:39.780822000 UTC, "Connected"]
I, [2019-10-26 22:14:39 +08:00 #5335] INFO -- : [4, 9, "taobao.com", 2019-10-26 14:14:39.837072000 UTC, "Connected"]
I, [2019-10-26 22:14:39 +08:00 #5335] INFO -- : [5, 9, "taobao.com", 2019-10-26 14:14:39.892445000 UTC, "Connected"]
I, [2019-10-26 22:14:39 +08:00 #5335] INFO -- : [6, 9, "taobao.com", 2019-10-26 14:14:39.947160000 UTC, "Connected"]
I, [2019-10-26 22:14:40 +08:00 #5335] INFO -- : [7, 9, "taobao.com", 2019-10-26 14:14:40.005131000 UTC, "Connected"]
I, [2019-10-26 22:14:40 +08:00 #5335] INFO -- : [8, 9, "taobao.com", 2019-10-26 14:14:40.059551000 UTC, "Connected"]
I, [2019-10-26 22:14:40 +08:00 #5335] INFO -- : [9, 9, "taobao.com", 2019-10-26 14:14:40.120486000 UTC, "Connected"]
I, [2019-10-26 22:14:40 +08:00 #5335] INFO -- : [10, 9, "taobao.com", 2019-10-26 14:14:40.176422000 UTC, "Connected"]
I, [2019-10-26 22:14:40 +08:00 #5335] INFO -- : [11, 9, "taobao.com", 2019-10-26 14:14:40.234508000 UTC, "Connected"]
I, [2019-10-26 22:14:40 +08:00 #5335] INFO -- : [12, 9, "taobao.com", 2019-10-26 14:14:40.290436000 UTC, "Connected"]
I, [2019-10-26 22:14:40 +08:00 #5335] INFO -- : [13, 9, "taobao.com", 2019-10-26 14:14:40.345031000 UTC, "Connected"]
I, [2019-10-26 22:14:40 +08:00 #5335] INFO -- : [14, 9, "taobao.com", 2019-10-26 14:14:40.399339000 UTC, "Connected"]
I, [2019-10-26 22:14:40 +08:00 #5335] INFO -- : [15, 9, "taobao.com", 2019-10-26 14:14:40.457172000 UTC, "Connected"]
I, [2019-10-26 22:14:40 +08:00 #5335] INFO -- : [16, 9, "taobao.com", 2019-10-26 14:14:40.512656000 UTC, "Connected"]
I, [2019-10-26 22:14:40 +08:00 #5335] INFO -- : [17, 9, "taobao.com", 2019-10-26 14:14:40.565735000 UTC, "Connected"]
I, [2019-10-26 22:14:40 +08:00 #5335] INFO -- : [18, 9, "taobao.com", 2019-10-26 14:14:40.619120000 UTC, "Connected"]
I, [2019-10-26 22:14:40 +08:00 #5335] INFO -- : [19, 9, "taobao.com", 2019-10-26 14:14:40.675409000 UTC, "Connected"]
I, [2019-10-26 22:14:40 +08:00 #5335] INFO -- : [20, 9, "taobao.com", 2019-10-26 14:14:40.732372000 UTC, "Connected"]
I, [2019-10-26 22:14:40 +08:00 #5335] INFO -- : [21, 9, "taobao.com", 2019-10-26 14:14:40.791124000 UTC, "Connected"]
I, [2019-10-26 22:14:40 +08:00 #5335] INFO -- : [22, 9, "taobao.com", 2019-10-26 14:14:40.845640000 UTC, "Connected"]
I, [2019-10-26 23:42:42 +08:00 #11290] INFO -- : [23, 4, "facebook.com", 2019-10-26 15:42:42.416706000 UTC, "Connected"]
I, [2019-10-26 23:42:42 +08:00 #11290] INFO -- : [24, 4, "taobao.com", 2019-10-26 15:42:42.416849000 UTC, "Connected"]
I, [2019-10-26 23:42:42 +08:00 #11290] INFO -- : [24, 5, "facebook.com", 2019-10-26 15:42:42.416978000 UTC, "Connected"]
I, [2019-10-26 23:42:42 +08:00 #11290] INFO -- : [24, 6, "facebook.com", 2019-10-26 15:42:42.417674000 UTC, "Connected"]
I, [2019-10-26 23:42:42 +08:00 #11290] INFO -- : [24, 7, "facebook.com", 2019-10-26 15:42:42.417750000 UTC, "Connected"]
I, [2019-10-26 23:42:47 +08:00 #11290] INFO -- : [25, 7, "taobao.com", 2019-10-26 15:42:47.428807000 UTC, "Connected"]
I, [2019-10-26 23:42:47 +08:00 #11290] INFO -- : [25, 8, "facebook.com", 2019-10-26 15:42:47.428838000 UTC, "Connected"]
I, [2019-10-26 23:42:47 +08:00 #11290] INFO -- : [25, 9, "facebook.com", 2019-10-26 15:42:47.429027000 UTC, "Connected"]
I, [2019-10-26 23:42:47 +08:00 #11290] INFO -- : [25, 10, "facebook.com", 2019-10-26 15:42:47.432212000 UTC, "Connected"]
I, [2019-10-26 23:42:48 +08:00 #11290] INFO -- : [#<IO::Timeout:connect timed out>]
I, [2019-10-26 23:42:52 +08:00 #11290] INFO -- : [25, 11, "facebook.com", 2019-10-26 15:42:52.438132000 UTC, "Connected"]
I, [2019-10-26 23:42:52 +08:00 #11290] INFO -- : [25, 12, "facebook.com", 2019-10-26 15:42:52.439745000 UTC, "Connected"]
I, [2019-10-26 23:42:57 +08:00 #11290] INFO -- : [#<IO::Timeout:connect timed out>]
I, [2019-10-26 23:42:57 +08:00 #11290] INFO -- : [25, 13, "facebook.com", 2019-10-26 15:42:57.444313000 UTC, "Connected"]
I, [2019-10-26 23:42:57 +08:00 #11290] INFO -- : [#<IO::Timeout:connect timed out>]
I, [2019-10-26 23:42:57 +08:00 #11290] INFO -- : [26, 13, "taobao.com", 2019-10-26 15:42:57.444448000 UTC, "Connected"]
I, [2019-10-26 23:42:57 +08:00 #11290] INFO -- : [#<IO::Timeout:connect timed out>]
I, [2019-10-26 23:42:57 +08:00 #11290] INFO -- : [26, 14, "facebook.com", 2019-10-26 15:42:57.444492000 UTC, "Connected"]
I, [2019-10-26 23:42:57 +08:00 #11290] INFO -- : [26, 15, "facebook.com", 2019-10-26 15:42:57.444637000 UTC, "Connected"]
I, [2019-10-26 23:42:57 +08:00 #11290] INFO -- : [26, 16, "facebook.com", 2019-10-26 15:42:57.444664000 UTC, "Connected"]
I, [2019-10-26 23:42:57 +08:00 #11290] INFO -- : [26, 17, "facebook.com", 2019-10-26 15:42:57.444684000 UTC, "Connected"]
I, [2019-10-26 23:42:57 +08:00 #11290] INFO -- : [#<IO::Timeout:connect timed out>]
I, [2019-10-26 23:42:58 +08:00 #11290] INFO -- : [26, 18, "facebook.com", 2019-10-26 15:42:58.375205000 UTC, "Connected"]
I, [2019-10-26 23:43:01 +08:00 #11290] INFO -- : [26, 19, "facebook.com", 2019-10-26 15:43:01.408139000 UTC, "Connected"]
I, [2019-10-26 23:43:02 +08:00 #11290] INFO -- : [26, 20, "facebook.com", 2019-10-26 15:43:02.414938000 UTC, "Connected"]
I, [2019-10-26 23:43:02 +08:00 #11290] INFO -- : [#<IO::Timeout:connect timed out>]
I, [2019-10-26 23:43:02 +08:00 #11290] INFO -- : [26, 21, "facebook.com", 2019-10-26 15:43:02.447648000 UTC, "Connected"]
I, [2019-10-26 23:43:02 +08:00 #11290] INFO -- : [26, 22, "facebook.com", 2019-10-26 15:43:02.447684000 UTC, "Connected"]
I, [2019-10-26 23:43:02 +08:00 #11290] INFO -- : [26, 23, "facebook.com", 2019-10-26 15:43:02.447711000 UTC, "Connected"]
I, [2019-10-26 23:43:02 +08:00 #11290] INFO -- : [26, 24, "facebook.com", 2019-10-26 15:43:02.447744000 UTC, "Connected"]
I, [2019-10-26 23:43:02 +08:00 #11290] INFO -- : [#<IO::Timeout:connect timed out>]
I, [2019-10-26 23:43:02 +08:00 #11290] INFO -- : [26, 25, "facebook.com", 2019-10-26 15:43:02.449457000 UTC, "Connected"]
I, [2019-10-26 23:43:02 +08:00 #11290] INFO -- : [26, 26, "facebook.com", 2019-10-26 15:43:02.449687000 UTC, "Connected"]
I, [2019-10-26 23:43:02 +08:00 #11290] INFO -- : [26, 27, "facebook.com", 2019-10-26 15:43:02.451694000 UTC, "Connected"]
I, [2019-10-26 23:43:02 +08:00 #11290] INFO -- : [#<IO::Timeout:connect timed out>]
I, [2019-10-26 23:43:02 +08:00 #11290] INFO -- : [27, 27, "taobao.com", 2019-10-26 15:43:02.504360000 UTC, "Connected"]
I, [2019-10-26 23:43:03 +08:00 #11290] INFO -- : [27, 28, "facebook.com", 2019-10-26 15:43:03.426798000 UTC, "Connected"]
I, [2019-10-26 23:43:05 +08:00 #11290] INFO -- : [27, 29, "facebook.com", 2019-10-26 15:43:05.444144000 UTC, "Connected"]
I, [2019-10-26 23:43:06 +08:00 #11290] INFO -- : [27, 30, "facebook.com", 2019-10-26 15:43:06.455087000 UTC, "Connected"]
I, [2019-10-26 23:43:07 +08:00 #11290] INFO -- : [#<IO::Timeout:connect timed out>]
I, [2019-10-26 23:43:12 +08:00 #11290] INFO -- : [#<IO::Timeout:connect timed out>]
I, [2019-10-26 23:43:12 +08:00 #11290] INFO -- : [#<IO::Timeout:connect timed out>]
I, [2019-10-26 23:43:12 +08:00 #11290] INFO -- : [27, 31, "facebook.com", 2019-10-26 15:43:12.461889000 UTC, "Connected"]
I, [2019-10-26 23:43:12 +08:00 #11290] INFO -- : [27, 33, "facebook.com", 2019-10-26 15:43:12.462316000 UTC, "Connected"]
I, [2019-10-26 23:43:12 +08:00 #11290] INFO -- : [27, 34, "facebook.com", 2019-10-26 15:43:12.462407000 UTC, "Connected"]
I, [2019-10-26 23:43:12 +08:00 #11290] INFO -- : [27, 34, "facebook.com", 2019-10-26 15:43:12.462488000 UTC, "Connected"]
I, [2019-10-26 23:43:12 +08:00 #11290] INFO -- : [27, 35, "facebook.com", 2019-10-26 15:43:12.462661000 UTC, "Connected"]
I, [2019-10-26 23:43:17 +08:00 #11290] INFO -- : [27, 36, "facebook.com", 2019-10-26 15:43:17.472324000 UTC, "Connected"]
I, [2019-10-26 23:43:17 +08:00 #11290] INFO -- : [27, 37, "facebook.com", 2019-10-26 15:43:17.472625000 UTC, "Connected"]
I, [2019-10-26 23:43:17 +08:00 #11290] INFO -- : [27, 39, "facebook.com", 2019-10-26 15:43:17.472784000 UTC, "Connected"]
I, [2019-10-26 23:43:17 +08:00 #11290] INFO -- : [27, 39, "facebook.com", 2019-10-26 15:43:17.472829000 UTC, "Connected"]
I, [2019-10-26 23:43:17 +08:00 #11290] INFO -- : [27, 40, "facebook.com", 2019-10-26 15:43:17.473038000 UTC, "Connected"]
I added a video GIF to this problem.
I need to temporarily put aside this problem, I have to deal with other things, waiting for the next problem to reappear.
At first blush appears to be DNS lookup "hanging" in the kernel, perhaps? Do you get the same problem when running in linux?
@rdp No, I haven't tested this problem in Linux yet, I will test it later and add more GIF videos (I will test this problem as soon as possible, need some time π).
Is the HTTP server doing a DNS lookup on incoming requests? Hmm. Hanging at all is of course bad though, even if avoided in this particular situation. Here's about the only thing I could see similar out there: https://forum.photonengine.com/discussion/2801/photon-hang-when-connecting-under-bad-internet-connection What version of OS X? Are the call graphs in the initial description from two different runs? Does it actually hang or just timeout after 3s?
Also maybe in linux it could use https://stackoverflow.com/questions/32619132/getaddrinfo-and-gethostbyname-hangs-the-system to avoid hanging a thread? hmm... (unrelated) :)
@rdp
Here's about the only thing I could see similar out there: https://forum.photonengine.com/discussion/2801/photon-hang-when-connecting-under-bad-internet-connection
Yes, I also found this URL, But I found another #2745 (Says fixes the problem getaddrinfo).
What version of OS X?
$ uname -a
Darwin iMac 16.7.0 Darwin Kernel Version 16.7.0: Thu Jun 15 17:36:27 PDT 2017; root:xnu-3789.70.16~2/RELEASE_X86_64 x86_64
macOS Sierra 10.12.6
Are the call graphs in the initial description from two different runs?
Yes, I click "sampling" when it is (freeze | blocking | hangs | waiting). These "sampling" come from different runs.
Does it actually hang or just timeout after 3s?
He will get stuck (slow processing), "exception" will be triggered after timeout.
I recorded another GIF video (macOS), you can take a look. (Please let the GIF file fully load before watching β 6 MegaBytes).
@rdp
Also maybe in linux it could use https://stackoverflow.com/questions/32619132/getaddrinfo-and-gethostbyname-hangs-the-system to avoid hanging a thread? hmm... (unrelated) :)
Yes, I have seen this before, then I found #2745 again. (Sorry, I am not a C developer, so I can't solve this problem :( (can't provide much help, do my best :) ) )
(In addition, I don't know if Rust or Golang will have this problem. (Can you give me some code? Let me test it)) (I will go to test linux (ubuntu) first.)
@rdp
# It consists on starting a thread that will run the blocking `getaddrinfo`
# syscall, thus allowing the main thread βthe one running the event loopβ to not
# stop while resolving domains, which can take from milliseconds to seconds.
# Communication with the thread is achieved through a socket pair, one for the
# main thread, and one for the resolver thread. Since we rely on socket, the
# event-loop will pause or resume fibers accordingly to messages sent or
# received from the socket.
Couple questions, after looking more closely. Maybe requesting facebook.com somehow bworks your DNS server? What if you specify a different DNS server (for your local connection), same result? What if you use a ruby server instead of crystal, same result? What if you have two (crystal) servers, on different ports, and hit them same type of order, same result? What if you run a simple C program that calls getaddrinfo, when the hangs start happening, does it hang? ex: http://www.logix.cz/michal/devel/various/getaddrinfo.c ? What if you kill mDNS process do things start working again?
@rdp
Maybe requesting facebook.com somehow bworks your DNS server?
(Facebook, Google, Twitter, Youtube, ...) will trigger this problem.
(Any prohibited website will be like this (E.g. in China) ).
Only facebook.com
is the most obvious (the longest time is stuck).
What if you use a ruby server instead of crystal, same result?
Sorry, I don't know how to implement this in Ruby. Can I provide some Ruby code? π (When I was learning Ruby halfway, I discovered the Crystal Language.)
What if you kill mDNS process do things start working again?
is killing mDNSResponder
and mDNSResponderHelper
?
What if you run a simple C program that calls getaddrinfo, when the hangs start happening, does it hang? ex: http://www.logix.cz/michal/devel/various/getaddrinfo.c ?
clang getaddrinfo.c -o getaddrinfo.c
./getaddrinfo [domain]
Router: 192.168.1.1
)User$ ./getaddrinfo facebook.com β 5.52 Seconds
Host: facebook.com
IPv4 address: 67.15.100.252 (facebook.com)
User$ ./getaddrinfo google.com β 0 Seconds
Host: google.com
IPv4 address: 172.217.24.14 (google.com)
IPv6 address: 2404:6800:4008:801::200e (google.com)
User$ ./getaddrinfo wikipedia.org β 1.26 Seconds
Host: wikipedia.org
IPv4 address: 69.171.232.21 (wikipedia.org)
User$ ./getaddrinfo facebook.com β 2.47 Second.
Host: facebook.com
IPv4 address: 66.220.151.20 (facebook.com)
User$ ./getaddrinfo twitter.com β 2.53 Seconds
Host: twitter.com
IPv4 address: 66.220.155.14 (twitter.com)
User$ ./getaddrinfo youtube.com β 0 Seconds
Host: youtube.com
IPv4 address: 8.7.198.45 (youtube.com)
IPv6 address: 2404:6800:4008:800::200e (youtube.com)
User$ ./getaddrinfo facebook.com β 2.45 Seconds
Host: facebook.com
IPv4 address: 69.63.178.13 (facebook.com)
User$ ./getaddrinfo facebook.com β 5.78 Seconds
Host: facebook.com
IPv4 address: 69.63.178.13 (facebook.com)
User$ ./getaddrinfo twitter.com β 1.44 Seconds
Host: twitter.com
IPv4 address: 75.126.124.162 (twitter.com)
8.8.8.8
, 8.8.4.4
)
User$ ./getaddrinfo twitter.com β 0 Seconds.
Host: twitter.com
IPv4 address: 69.63.178.13 (twitter.com)
IPv4 address: 69.63.178.13 (twitter.com)
IPv4 address: 174.37.154.236 (twitter.com)
User$ ./getaddrinfo youtube.com β 0 Seconds Host: youtube.com IPv4 address: 203.98.7.65 (youtube.com)
User$ ./getaddrinfo google.com β 0 Seconds Host: google.com IPv4 address: 93.46.8.90 (google.com)
User$ ./getaddrinfo wikipedia.com β 0.5 - 1 Seconds Host: wikipedia.com IPv4 address: 208.80.153.232 (wikipedia.com) IPv6 address: 2620:0:860:ed1a::9 (wikipedia.com)
User$ ./getaddrinfo wikipedia.org β 0 Seconds Host: wikipedia.org IPv4 address: 69.171.225.13 (wikipedia.org) IPv4 address: 69.171.225.13 (wikipedia.org) IPv4 address: 78.16.49.15 (wikipedia.org)
User$ ./getaddrinfo gov.af β 0.5 - 1 Seconds getaddrinfo: Undefined error: 0
---
> Maybe requesting facebook.com somehow bworks your DNS server? What if you specify a different DNS server (for your local connection), same result?
Yes, it seems that only the DNS of the Router (Default) will be stuck.
Default DNS: `192.168.1.1` -> Router: `180.76.76.76`, `114.114.114.114`
`8.8.8.8` `8.8.4.4` will not happen this problem.
---
> ...
...
---
@rdp
Thanks, I am now 90% - 100% confirmed that this problem will only be triggered by using the default DNS (Router DNS). πππ
Yes on killing mDNSResponder. What if you run two separate crystal processes, one you use for facebook, one for taobao, and hit them with the same pattern does requesting from one process affect the other process? Here's an attempt at a ruby version: https://gist.github.com/rdp/88a55363e254ac037d344480c16d0187 (seems to use fibers so should be about the same). As a side note it appears that HTTP::Server is already using fibers so adding new fibers shouldn't do too much (though might let you control if they're same_thread or not, I suppose).
You should also get the same result in linux if it's your local router. It will be interesting to see if the ruby one has the same issue or not, as well.
Appears the IO::Timeout is generated when the C call (getaddrinfo) takes so long that, when it comes back, the fibers waiting for a connect to occur all say "oops, allotted timeout has passed" even though the connect might have actually occurred, the time keeping mechanism might not have noticed that it has occurred. I was able to repro a small demo example of it here: https://gist.github.com/rdp/31632b22d99d9e5c2cc815c3d51ca446 I wouldn't expect a C call in one fiber to be able to do that to a connect in another, but it is still a mighty head scratcher as to how to not do that...especially considering that I don't fully understand the details for the timeout mechanism :) I might create a new issue for it...
Initial example works OK here (linux). Sometimes I get this with fiber version:
Unhandled exception in spawn: Closed stream (IO::Error)
from /usr/share/crystal/src/io.cr:128:5 in 'check_open'
from /usr/share/crystal/src/io/buffered.cr:130:15 in 'write'
from /usr/share/crystal/src/http/server/response.cr:80:7 in 'write'
from /usr/share/crystal/src/io.cr:481:7 in 'write_utf8'
from /usr/share/crystal/src/string.cr:4285:5 in 'to_s'
from /usr/share/crystal/src/io.cr:184:5 in '<<'
from /usr/share/crystal/src/io.cr:198:5 in 'print'
from him.cr:35:5 in '->'
from /usr/share/crystal/src/fiber.cr:255:3 in 'run'
from /usr/share/crystal/src/fiber.cr:48:34 in '->'
from ???
in the spawn version which I think is because the actual request was already closed, just for reference.
@rdp
What if you run two separate crystal processes, one you use for facebook, one for taobao, and hit them with the same pattern does requesting from one process affect the other process?
(Custom DNS: 8.8.8.8, 8.8.4.4) If you use Custom DNS, this won't have any problems (all, no stuck).
(Default DNS: Router: 192.168.1.1) If you use Default DNS, Yes, all will get stuck (slow processing).
(Just like the previous Gif video). (I really want to send a video of my running process, my network is too bad) (I recorded the whole process, Maybe my ISP or ProxyServer limits my upload speed). (I will try to upload, it may take some time).
Other questions I am testing, please wait a moment. :)
@rdp
Yes on killing mDNSResponder.
00:04 (I turned on two run processing (alias | a.k.a a
, b
) ).
00:04 (a process Port 8776
, b process Port 8777
).
00:05 (I first run taobao curl (a, b process)).
00:06 - 00:20 (Then I run facebook curl Manual Loop run: (run -> wait 0.01 seconds -> cancel) (a, b process)) β 4 - 5 times (Keep stuck (slow processing)).
00:21 - 00:22 (I opened the activity monitor).
00:22 - 00:24 (When I first forced the kill mDNSResponder
process, it was still alive).
00:26 - 00:28 (It seems to process some requests again) (fast processing).
00:28 - 00:29 (So I forced the kill mDNSResponder
process again, it was killed).
00:29 - 00:34 (then stuck again).
00:29 - 00:34 (This is stuck for 2-3 seconds, then the mDNSResponder process is revived.
00:34 - 00:43 (When the mDNSResponder process is revived, a, b process is no longer stuck (fast processing)).
00:43 - End (Then I run facebook curl Manual Loop run: (run -> wait 0.01 seconds -> cancel) (a, b process)) (Keep stuck).
@rdp
Here's an attempt at a ruby version: https://gist.github.com/rdp/88a55363e254ac037d344480c16d0187 (seems to use fibers so should be about the same).
You should also get the same result in linux if it's your local router. It will be interesting to see if the ruby ββone has the same issue or not, as well.
(I used (Default DNS: Router: 192.168.1.1) to test the Webrick Ruby script).
When I run taobao.com through curl this doesn't work. (Client It is stuck directly (No request has been processed), Server does not have any logs).
Then I run facebook through curl again. (Client It is stuck directly (No request has been processed), Server does not have any logs).
Then I closed the curl
After about β 30 seconds?, the server has log information.
ruby 2.6.5p114 (2019-10-01 revision 67812) [x86_64-darwin16]
ruby webrick.rb
for ((i=1;i<=20000;i++)); do curl -v http://127.0.0.1:8000/?foo=bar ; done
curl -v http://127.0.0.1:8000
iMac:Example User$ ruby webrick.rb
[2019-11-11 19:10:05] INFO WEBrick 1.4.2
[2019-11-11 19:10:05] INFO ruby 2.6.5 (2019-10-01) [x86_64-darwin16]
[2019-11-11 19:10:05] INFO WEBrick::HTTPServer#start: pid=32689 port=8000
[2019-11-11 19:11:27] ERROR Errno::ETIMEDOUT: Operation timed out - connect(2) for "facebook.com" port 80
webrick.rb:12:in `initialize'
webrick.rb:12:in `new'
webrick.rb:12:in `block in <main>'
/usr/local/Cellar/ruby/2.6.5/lib/ruby/2.6.0/webrick/httpservlet/prochandler.rb:39:in `do_GET'
/usr/local/Cellar/ruby/2.6.5/lib/ruby/2.6.0/webrick/httpservlet/abstract.rb:105:in `service'
/usr/local/Cellar/ruby/2.6.5/lib/ruby/2.6.0/webrick/httpserver.rb:140:in `service'
/usr/local/Cellar/ruby/2.6.5/lib/ruby/2.6.0/webrick/httpserver.rb:96:in `run'
/usr/local/Cellar/ruby/2.6.5/lib/ruby/2.6.0/webrick/server.rb:307:in `block in start_thread'
127.0.0.1 - - [11/Nov/2019:19:10:07 CST] "GET /?foo=bar HTTP/1.1" 500 349
- -> /?foo=bar
[2019-11-11 19:11:31] ERROR Errno::ETIMEDOUT: Operation timed out - connect(2) for "facebook.com" port 80
webrick.rb:12:in `initialize'
webrick.rb:12:in `new'
webrick.rb:12:in `block in <main>'
/usr/local/Cellar/ruby/2.6.5/lib/ruby/2.6.0/webrick/httpservlet/prochandler.rb:39:in `do_GET'
/usr/local/Cellar/ruby/2.6.5/lib/ruby/2.6.0/webrick/httpservlet/abstract.rb:105:in `service'
/usr/local/Cellar/ruby/2.6.5/lib/ruby/2.6.0/webrick/httpserver.rb:140:in `service'
/usr/local/Cellar/ruby/2.6.5/lib/ruby/2.6.0/webrick/httpserver.rb:96:in `run'
/usr/local/Cellar/ruby/2.6.5/lib/ruby/2.6.0/webrick/server.rb:307:in `block in start_thread'
127.0.0.1 - - [11/Nov/2019:19:10:10 CST] "GET / HTTP/1.1" 500 349
- -> /
@rdp
You should also get the same result in linux if it's your local router. It will be interesting to see if the ruby ββone has the same issue or not, as well.
I have installed Ubuntu 18.04 in the virtual machine (Parallels Desktop), I need to install llvm and crystal. Sorry, Delayed for a long time, Iβm a little lazy and my network downloads large files is a bit slow. π€
@rdp
Appears the IO::Timeout is generated when the C call (getaddrinfo) takes so long that, when it comes back, the fibers waiting for a connect to occur all say "oops, allotted timeout has passed" even though the connect might have actually occurred, the time keeping mechanism might not have noticed that it has occurred. I was able to repro a small demo example of it here: https://gist.github.com/rdp/31632b22d99d9e5c2cc815c3d51ca446 I wouldn't expect a C call in one fiber to be able to do that to a connect in another, but it is still a mighty head scratcher as to how to not do that...especially considering that I don't fully understand the details for the timeout mechanism :) I might create a new issue for it...
I am triggered by a tool like Proxifier.
I should record the video at the time, maybe now? I will try it later.
like Proxifier (TUNTAP Virtual Network Card + MITM (bad?)).
I don't want the account to be associated, so I added an underscore.
OK if you have 2 separate processes (each with its own different port) for crystal, then accessing facebook on one should not slow the other one down... :)
@rdp
I understand it wrong, I thought it was two processes running at the same time (all stuck (Port 8777, 8776) slow processing).
(mDNSResponder
has only been tested this way).
[Window 1 | 0:00]: `./a` => 8777
[Window 2 | 0:00]: `./b` => 8776
[Window 3 | 0:01]: `for ((i=1;i<=20000;i++)); do curl -v http://127.0.0.1:8777/?foo=bar ; done`
[Window 4 | 0:01]: `for ((i=1;i<=20000;i++)); do curl -v http://127.0.0.1:8776/?foo=bar ; done`
[Window 5 | 0:02]: `curl -v http://127.0.0.1:8777`
[Window 6 | 0:02]: `curl -v http://127.0.0.1:8776`
If it is accessed sequentially, it will not affect another server (another process) (everything is normal (Port 8776)), Port 8777 Server is stuck (slow processing).
[Window 1 | 0:00]: `./a` => 8777
[Window 2 | 0:00]: `./b` => 8776
[Window 3 | 0:01]: `for ((i=1;i<=20000;i++)); do curl -v http://127.0.0.1:8777/?foo=bar ; done`
[Window 4 | 0:01]: `for ((i=1;i<=20000;i++)); do curl -v http://127.0.0.1:8776/?foo=bar ; done`
[Window 5 | 0:02]: `curl -v http://127.0.0.1:8777`
@rdp
I have already installed it (linux / ubuntu), I am testing it.
parallels@parallels-Parallels-Virtual-Platform:~$ crystal -v
Crystal 0.31.1 [0e2e1d067] (2019-09-30)
LLVM: 8.0.0
Default target: x86_64-unknown-linux-gnu
@rdp
Yes, after testing, Ubuntu will have the same problem.
OK at this point I think we may as well close this as "works as expected" (it blocks on the gethostbyname, which causes crystal to freeze till it returns, as it does with all C calls). i can file some issues to point out how painful the situation is :)
@rdp
(This problem can be triggered every time, without any additional conditions) (No need for TUNTAP, MITM, Proxifier). (But you must use a China proxy server request to trigger this problem). (I think, if there is no timeout, HTTP::Client should appear hangs problem).
require "http/client"
require "openssl"
class HTTP::Client
def self.new(uri : URI, tls = nil)
tls = tls_flag(uri, tls)
host = validate_host(uri)
client = new(host, uri.port, tls)
begin
yield client
ensure
puts [Time.utc, "#L12"]
client.close
puts [Time.utc, "#L14"]
end
end
end
tls = OpenSSL::SSL::Context::Client.new
tls.verify_mode = OpenSSL::SSL::VerifyMode::NONE
puts [Time.utc, LibSSL::OPENSSL_VERSION, "https://bing.com"]
begin
HTTP::Client.new uri: URI.parse("https://bing.com"), tls: tls do |client|
client.connect_timeout = 30.second
client.read_timeout = 30.second
client.get("/") do |y|
puts [Time.utc, y]
# puts [y.body_io.gets_to_end]
end
end
puts [0, 3, 4, 5]
rescue ex : IO::Timeout
puts [Time.utc, 2, ex.message]
end
crystal build bing.cr --release
./bing
iMac:Example User$ ./bing
[2019-11-12 07:28:51.565831000 UTC, "1.0.2", "https://bing.com"]
[2019-11-12 07:28:52.300818000 UTC, #<HTTP::Client::Response:0x102297f00 @version="HTTP/1.1", @status=MOVED_PERMANENTLY, @status_message="Moved Permanently", @headers=HTTP::Headers{"Location" => "https://cn.bing.com/", "Server" => "Microsoft-IIS/10.0", "X-MSEdge-Ref" => "Ref A: 24786EA558F449F0BDF8B55A37DC8027 Ref B: HKGEDGE0515 Ref C: 2019-11-12T07:28:51Z", "Date" => "Tue, 12 Nov 2019 07:28:50 GMT", "Content-Length" => "0"}, @body_io=nil, @cookies=nil, @body="">]
[2019-11-12 07:28:52.301368000 UTC, "#L12"]
[2019-11-12 07:29:22.303095000 UTC, 2, "Read timed out"]
@rdp
Okay. :)
But I still have a confusion, if all C calls will have the same problem, but it seems that Swift, Java (reverse proxy server) does not have this problem.
I don't seem to have this problem when I use some other reverse proxy tools (E.g. BurpSuite (Java)).
Or do they solve this problem? (other language)?
In the end, it seems that I can't solve this problem, some disappointed. It seems that (E.g. Java, Rust (Tokio), Swift) use some kind of asynchronous. I hope that one day this problem will be solved (in Crystal). If someone else has encountered this problem, ... @asterite I might try libuv(uv_getaddrinfo), libevent(evdns_getaddrinfo) (although I don't know much about C). Finally, thank you for replying to this issue.
You might could work around it by shelling out to "nslookup" to lookup the DNS IP address each time (or looking it up once, and cacheing it). Not sure how that works with the built-in TCP client though. Yeah there's definitely some work to be done there, it is absolutely blocking the world at present, which doesn't seem quite right...
@rdp Okay, thank you for your advice and help. :) π
I know what @rdp means, Let me translate what it says. If you call C.getaddrinfo from Crystal, it will cause the handler to freeze. (Certain system calls block forever and don't work well with fibers). #1454 (The handler will freeze and cannot accept new clients).
If you make a DNS resolver (Synchronous non-async) in Crystal, there will be blocking problems, but this will only block the current fiber and will not cause the handler to freeze. (I have tested it, sending a DNSPacket over UDPSocket to an invalid UDPServer and receiving a response, which only blocking the current fiber).
Window 1 | 00:01: ./example
Window 2 | 00:02: loop { sudo nc -l -u 53 -> waiting 0.1 seconds -> cancel }
Window 3 | 00:03: for ((i=1;i<=20000;i++)); do curl -v http://127.0.0.1:8777; done
(which only blocking the current fiber).
Window 4 | 00:04: loop { curl -v "http://127.0.0.1:8777" -> waiting 0.1 seconds -> cancel }
(not freeze | not get stuck).
(The handler does not freeze and can accept new clients). (When other users request this HTTP::Server, it will be processed normally and will not get stuck).
It's not hard to make a DNS resolver (Synchronous non-async). Then maybe some LRU cache is needed. This is much simpler than using libuv, libevent.
Sometimes these problems need to be solved by themselves, like OpenSSL binding (create certificate / create private key / MEMBIO / load certificate from memory / load private key from memory). like #7897, #7896 (stdlib.cr Already available).
They may not help for (me / you), Sometimes (me / you) may need to solve this problem (yourself / myself).
This made me learn a lot :), (Just do it, wait for my good news π).
I guess the real fix for this would be some kind of asynchronous [even if threaded] DNS lookup (at least option) for those of us with misbehaving DNS servers... :|
@rdp
Yes, but LibC.getaddrinfo
calls are very deadly.
(Certain system calls block forever and don't work well with fibers).
Synchronization is not as good as asynchronous "better".
But it won't get stuck like LibC.getaddrinfo
(This is enough for me).
The DNSResolver project progress has now reached 30%. (I am researching the DNS packet structure and it will take some time). (I don't think it will take too long).
LibC.getaddrinfo
directly using an IP address to connect.
require "socket"
class Socket def connect(ip_address : IPAddress, connect_timeout = nil) Addrinfo.build_addrinfo ip_address, @family, @type, @protocol do |addrinfo| connect(addrinfo, timeout: connect_timeout) { |error| error } end end end
class TCPSocket def initialize(ip_address : IPAddress, dns_timeout = nil, connect_timeout = nil) Addrinfo.build_tcp ip_address do |addrinfo| super addrinfo.family, addrinfo.type, addrinfo.protocol connect(addrinfo, timeout: connect_timeout) do |error| close error end end end end
class Socket struct Addrinfo def self.build_addrinfo(ip_address : IPAddress, family, type, protocol, &block : Addrinfo ->) ip_address_size = ip_address.size.to_u8 service = ip_address.port
hints = LibC::Addrinfo.new
hints.ai_family = (ip_address.family || Family::UNSPEC).to_i32
hints.ai_socktype = type
hints.ai_protocol = protocol
hints.ai_flags = 0_u8
hints.ai_addrlen = ip_address_size
hints.ai_addr = ip_address.to_unsafe
hints.ai_addr.value.sa_len = ip_address_size
if service.is_a? Int
hints.ai_flags |= LibC::AI_NUMERICSERV
end
# On OS X < 10.12, the libsystem implementation of getaddrinfo segfaults
# if AI_NUMERICSERV is set, and servname is NULL or 0.
{% if flag?(:darwin) %}
if (service == 0_i32 || service.nil?) && (hints.ai_flags & LibC::AI_NUMERICSERV)
hints.ai_flags |= LibC::AI_NUMERICSERV
end
{% end %}
yield new pointerof(hints)
end
def self.build_tcp(ip_address : IPAddress, family = Family::UNSPEC)
build_addrinfo ip_address, family, Type::STREAM, Protocol::TCP do |addrinfo|
yield addrinfo
end
end
def self.build_udp(ip_address : IPAddress, family = Family::UNSPEC)
build_addrinfo ip_address, family, Type::DGRAM, Protocol::UDP do |addrinfo|
yield addrinfo
end
end
end end
tcp_socket = TCPSocket.new Socket::IPAddress.new "127.0.0.1", 1234 tcp_socket << "Hello World!" tcp_socket.close
tcp_socket = TCPSocket.new Socket::IPAddress.new "::1", 1234 tcp_socket << "Hello World!" tcp_socket.close
udp_socket = UDPSocket.new Socket::Family::INET udp_socket.connect Socket::IPAddress.new "127.0.0.1", 1234 udp_socket << "Hello World!" udp_socket.close
udp_socket = UDPSocket.new Socket::Family::INET6 udp_socket.connect Socket::IPAddress.new "::1", 1234 udp_socket << "Hello World!" udp_socket.close
getaddrinfo
should already "not block" for IP addresses like 127.0.0.1 (does it?). That's actually an excellent idea (just monkey patch it, possibly in a shard?) would be nice to see something in the stdlib for it eventually as well, either includable or as a run-time option or something, see the other issues :) Just for completeness sake here's a proof of concept "monkey patch to use the old libevent dns resolver": https://gist.github.com/rdp/c776c1ea95dfd666d1d828eba4caabc3 FWIW... :)
@rdp
getaddrinfo should already "not block" for IP addresses like 127.0.0.1 (does it?).
It should be, but I think, if it is an IP address, it does not need to use LibC.getaddrinfo
.
(Prevent any problems caused by LibC.getaddrinfo
)
That's actually an excellent idea (just monkey patch it, possibly in a shard?) would be nice to see something in the stdlib for it eventually as well, either includable or as a run-time option or something, see the other issues :)
Okay, I need more work to use, I am considering using CrDNS instead of LibC.getaddrinfo
.
Currently CrDNS has some damage, I need to fix it.
In order to work as soon as possible, I may remove the asynchronous feature of CrDNS.
In addition:
I looked at the inside of getaddrinfo.c yesterday.
It seems that LibC.getaddrinfo
internally uses libresolv
, libnss_dns
.
So I feel like giving up using LibC.getaddrinfo
.
libresolv is unboundedly synchronous.
You'd expect a real DNS client library to do better in this aspect than getaddrinfo(), but no: libresolv's function calls are still purely synchronous and may uncontrollably block if the network is unresponsive .
getaddrinfo() performs NSS resolution, not DNS resolution.
It is unboundedly synchronous.
DNS resolution performs network I/O, which can take a non-negligible amount of time. getaddrinfo() is a blocking call and there is no way to specify a timeout to make it return early, so it may block indefinitely. This is bad design.
Also, network operations being asynchronous by nature, even a synchronous API should provide a way to perform several queries at once and return when one of them, or all of them, get an answer. getaddrinfo() does not even offer that.
Conclusion
getaddrinfo() is a toy interface. For any half-serious DNS work, another API must be used.
The above system calls correlate with the UDP packet dump. Two packets are sent with one sendmmsg, then the two responses are read with recvfrom.
(Notice in passing the calls to poll, blocking the entire process for up to 5 seconds waiting for the DNS response. In a program with an event loop or βgreen threadsβ, this can be disastrous. getaddrinfo does not play nicely with your event loop!)
But getaddrinfo does a lot before these system calls, and it does quite a bit after them, too. You see the UDP βconnectionβ is to a DNS server at 10.0.2.3. But how did getaddrinfo know about 10.0.2.3? Itβs not a function argument. Just before the DNS request, the process makes these system calls:
Just for completeness sake here's a proof of concept "monkey patch to use the old libevent dns resolver": https://gist.github.com/rdp/c776c1ea95dfd666d1d828eba4caabc3 FWIW... :)
Okay, I will try it later. :)
@rdp
DNS::Resolver
.LRUCache
, uses some Custom Caches. (Cache (Any) | Cache::IPAddress (A, AAAA))A
, AAAA
, NS
, PTR
, CNAME
, SOA
, TXT
, MX
query.C.getaddrinfo
.
require "stdlib"
dns_servers = [] of Tuple(Socket::IPAddress, DNS::Resolver::Protocol) dns_servers << Tuple.new Socket::IPAddress.new("8.8.8.8", 53_i32), DNS::Resolver::Protocol::UDP dns_servers << Tuple.new Socket::IPAddress.new("1.1.1.1", 53_i32), DNS::Resolver::Protocol::UDP
buffer = uninitialized UInt8[4096_i32]
resolver = DNS::Resolver.new dns_servers resolver.ip_cache = DNS::Resolver::Cache::IPAddress.new
begin socket = SuperTCPSocket.new "www.example.com", 80_i32, resolver, 5_i32 socket.read_timeout = 5_i32 socket.write_timeout = 5_i32 rescue abort "Connect Failed" end
begin socket << "GET / HTTP/1.1\r\nHost: www.example.com\r\nConnection: close\r\n\r\n" rescue abort "Write Failed" end
begin length = socket.read buffer.to_slice rescue abort "Read Failed" end
puts [length, String.new buffer.to_slice[0_i32, length]]
* DNS::Resolver + DNS::Resolver::Cache
```crystal
require "stdlib"
dns_servers = [] of Tuple(Socket::IPAddress, DNS::Resolver::Protocol)
dns_servers << Tuple.new Socket::IPAddress.new("8.8.8.8", 53_i32), DNS::Resolver::Protocol::UDP
dns_servers << Tuple.new Socket::IPAddress.new("1.1.1.1", 53_i32), DNS::Resolver::Protocol::UDP
resolver = DNS::Resolver.new dns_servers
resolver.cache = DNS::Resolver::Cache.new
resolver.resolve "google.com", [DNS::Record::ResourceFlag::A, DNS::Record::ResourceFlag::AAAA] do |response|
puts [:google, Time.utc, response]
end
resolver.resolve "twitter.com", DNS::Record::ResourceFlag::SOA do |response|
puts [:twitter, Time.utc, response]
end
resolver.resolve "facebook.com", [DNS::Record::ResourceFlag::A, DNS::Record::ResourceFlag::AAAA] do |response|
puts [:facebook, Time.utc, response]
end
resolver.resolve "twitter.com", DNS::Record::ResourceFlag::SOA do |response|
puts [:twitter, Time.utc, response]
end
resolver.run
```text
User$ ./dns_resolve
[:google, 2019-12-11 15:20:06.574530000 UTC, [{"google.com", A, #<DNS::Packet::Response:0x1039dbc40 @queries=[#<DNS::Section::Question:0x1039d4a20 @flag=A, @query="google.com", @cls=IN>], @answers=[], @authority=[], @additional=[], @transId=40358, @operationCode=StandardQuery, @responseCode=NoError, @authoritativeAnswer=True, @truncated=False, @recursionDesired=True, @recursionAvailable=True, @authenticatedData=True, @checkingDisabled=True, @questionCount=1, @answerCount=1, @authorityCount=0, @additionalCount=0, @buffer=nil>}, {"google.com", AAAA, #<DNS::Packet::Response:0x1039dba80 @queries=[#<DNS::Section::Question:0x1039d4820 @flag=AAAA, @query="google.com", @cls=IN>], @answers=[#<DNS::Section::Answer:0x1039d47e0 @resourceRecord=#<DNS::Record::AAAA:0x1039ddd80 @cls=IN, @ttl=299, @from="google.com", @ipv6Address="2404:6800:4012::200e">>], @authority=[], @additional=[], @transId=4642, @operationCode=StandardQuery, @responseCode=NoError, @authoritativeAnswer=False, @truncated=False, @recursionDesired=True, @recursionAvailable=True, @authenticatedData=False, @checkingDisabled=False, @questionCount=1, @answerCount=1, @authorityCount=0, @additionalCount=0, @buffer=nil>}]]
[:twitter, 2019-12-11 15:20:06.623635000 UTC, [{"twitter.com", SOA, #<DNS::Packet::Response:0x1039db930 @queries=[#<DNS::Section::Question:0x1039d40c0 @flag=SOA, @query="twitter.com", @cls=IN>], @answers=[#<DNS::Section::Answer:0x1039d4080 @resourceRecord=#<DNS::Record::A:0x1039ddc60 @cls=IN, @ttl=163, @from="twitter.com", @ipv4Address="31.13.84.8">>], @authority=[], @additional=[], @transId=36287, @operationCode=StandardQuery, @responseCode=NoError, @authoritativeAnswer=False, @truncated=False, @recursionDesired=True, @recursionAvailable=True, @authenticatedData=False, @checkingDisabled=False, @questionCount=1, @answerCount=1, @authorityCount=0, @additionalCount=0, @buffer=nil>}]]
[:twitter, 2019-12-11 15:20:06.623981000 UTC, [{"twitter.com", SOA, #<DNS::Packet::Response:0x1039db930 @queries=[#<DNS::Section::Question:0x1039d40c0 @flag=SOA, @query="twitter.com", @cls=IN>], @answers=[#<DNS::Section::Answer:0x1039d4080 @resourceRecord=#<DNS::Record::A:0x1039ddc60 @cls=IN, @ttl=163, @from="twitter.com", @ipv4Address="31.13.84.8">>], @authority=[], @additional=[], @transId=36287, @operationCode=StandardQuery, @responseCode=NoError, @authoritativeAnswer=False, @truncated=False, @recursionDesired=True, @recursionAvailable=True, @authenticatedData=False, @checkingDisabled=False, @questionCount=1, @answerCount=1, @authorityCount=0, @additionalCount=0, @buffer=nil>}]]
[:facebook, 2019-12-11 15:20:06.761727000 UTC, [{"facebook.com", A, #<DNS::Packet::Response:0x1039db690 @queries=[#<DNS::Section::Question:0x1039f7d00 @flag=A, @query="facebook.com", @cls=IN>], @answers=[#<DNS::Section::Answer:0x1039f7cc0 @resourceRecord=#<DNS::Record::A:0x1039ddb40 @cls=IN, @ttl=299, @from="facebook.com", @ipv4Address="157.240.196.35">>], @authority=[], @additional=[], @transId=60636, @operationCode=StandardQuery, @responseCode=NoError, @authoritativeAnswer=False, @truncated=False, @recursionDesired=True, @recursionAvailable=True, @authenticatedData=False, @checkingDisabled=False, @questionCount=1, @answerCount=1, @authorityCount=0, @additionalCount=0, @buffer=nil>}, {"facebook.com", AAAA, #<DNS::Packet::Response:0x1039db4d0 @queries=[#<DNS::Section::Question:0x1039f7a80 @flag=AAAA, @query="facebook.com", @cls=IN>], @answers=[#<DNS::Section::Answer:0x1039f7a40 @resourceRecord=#<DNS::Record::A:0x1039ddab0 @cls=IN, @ttl=158, @from="facebook.com", @ipv4Address="8.7.198.45">>], @authority=[], @additional=[], @transId=31058, @operationCode=StandardQuery, @responseCode=NoError, @authoritativeAnswer=False, @truncated=False, @recursionDesired=True, @recursionAvailable=True, @authenticatedData=False, @checkingDisabled=False, @questionCount=1, @answerCount=1, @authorityCount=0, @additionalCount=0, @buffer=nil>}]]
Nice! Maybe make a shard out of it?
On Wed, Dec 11, 2019 at 8:27 AM _ notifications@github.com wrote:
@rdp https://github.com/rdp
Project Done (DNS::Resolver)
- I think I'm done -> 636f7374|stdlib.cr * src/stdlib/dns https://github.com/636f7374/stdlib.cr/tree/master/src/stdlib/dns.
- CrDNS https://github.com/teknomunk/cr-dns is very broken, so I made this DNS::Resolver.
- Added some Cache features, but Cache does not use LRUCache, uses some custom caches. (Cache (Any) | Cache::IPAddress (A, AAAA))
- Support querying records through multiple DNS servers.
- Support A, AAAA, NS, PTR, CNAME, SOA, TXT, MX query.
- Without using C.getaddrinfo.
- There are many more, I have not listed them ...
Example
- SuperSocket + DNS::Resolver + DNS::Resolver::Cache::IPAddress
require "stdlib"
dns_servers = [] of Tuple(Socket::IPAddress, DNS::Resolver::Protocol) dns_servers << Tuple.new Socket::IPAddress.new("8.8.8.8", 53_i32), DNS::Resolver::Protocol::UDP dns_servers << Tuple.new Socket::IPAddress.new("1.1.1.1", 53_i32), DNS::Resolver::Protocol::UDP
buffer = uninitialized UInt8[4096_i32]
resolver = DNS::Resolver.new dns_servers resolver.ip_cache = DNS::Resolver::Cache::IPAddress.new begin socket = SuperTCPSocket.new "www.example.com", 80_i32, resolver, 5_i32 socket.read_timeout = 5_i32 socket.write_timeout = 5_i32rescue abort "Connect Failed"end begin socket << "GET / HTTP/1.1\r\nHost: www.example.com\r\nConnection: close\r\n\r\n"rescue abort "Write Failed"end begin length = socket.read buffer.to_slicerescue abort "Read Failed"end puts [length, String.new buffer.to_slice[0_i32, length]]
- DNS::Resolver + DNS::Resolver::Cache
require "stdlib"
dns_servers = [] of Tuple(Socket::IPAddress, DNS::Resolver::Protocol) dns_servers << Tuple.new Socket::IPAddress.new("8.8.8.8", 53_i32), DNS::Resolver::Protocol::UDP dns_servers << Tuple.new Socket::IPAddress.new("1.1.1.1", 53_i32), DNS::Resolver::Protocol::UDP
resolver = DNS::Resolver.new dns_servers resolver.cache = DNS::Resolver::Cache.new
resolver.resolve "google.com", [DNS::Record::ResourceFlag::A, DNS::Record::ResourceFlag::AAAA] do |response| puts [:google, Time.utc, response]end
resolver.resolve "twitter.com", DNS::Record::ResourceFlag::SOA do |response| puts [:twitter, Time.utc, response]end
resolver.resolve "facebook.com", [DNS::Record::ResourceFlag::A, DNS::Record::ResourceFlag::AAAA] do |response| puts [:facebook, Time.utc, response]end
resolver.resolve "twitter.com", DNS::Record::ResourceFlag::SOA do |response| puts [:twitter, Time.utc, response]end
resolver.run
User$ ./dns_resolve [:google, 2019-12-11 15:20:06.574530000 UTC, [{"google.com", A, #<DNS::Packet::Response:0x1039dbc40 @queries=[#<DNS::Section::Question:0x1039d4a20 @flag=A, @query="google.com", @cls=IN>], @answers=[], @authority=[], @additional=[], @transId=40358, @operationCode=StandardQuery, @responseCode=NoError, @authoritativeAnswer=True, @truncated=False, @recursionDesired=True, @recursionAvailable=True, @authenticatedData=True, @checkingDisabled=True, @questionCount=1, @answerCount=1, @authorityCount=0, @additionalCount=0, @buffer=nil>}, {"google.com", AAAA, #<DNS::Packet::Response:0x1039dba80 @queries=[#<DNS::Section::Question:0x1039d4820 @flag=AAAA, @query="google.com", @cls=IN>], @answers=[#<DNS::Section::Answer:0x1039d47e0 @resourceRecord=#<DNS::Record::AAAA:0x1039ddd80 @cls=IN, @ttl=299, @from="google.com", @ipv6Address="2404:6800:4012::200e">>], @authority=[], @additional=[], @transId=4642, @operationCode=StandardQuery, @responseCode=NoError, @authoritativeAnswer=False, @truncated=False, @recursionDesired=True, @recursionAvailable=True, @authenticatedData=False, @checkingDisabled=False, @questionCount=1, @answerCount=1, @authorityCount=0, @additionalCount=0, @buffer=nil>}]] [:twitter, 2019-12-11 15:20:06.623635000 UTC, [{"twitter.com", SOA, #<DNS::Packet::Response:0x1039db930 @queries=[#<DNS::Section::Question:0x1039d40c0 @flag=SOA, @query="twitter.com", @cls=IN>], @answers=[#<DNS::Section::Answer:0x1039d4080 @resourceRecord=#<DNS::Record::A:0x1039ddc60 @cls=IN, @ttl=163, @from="twitter.com", @ipv4Address="31.13.84.8">>], @authority=[], @additional=[], @transId=36287, @operationCode=StandardQuery, @responseCode=NoError, @authoritativeAnswer=False, @truncated=False, @recursionDesired=True, @recursionAvailable=True, @authenticatedData=False, @checkingDisabled=False, @questionCount=1, @answerCount=1, @authorityCount=0, @additionalCount=0, @buffer=nil>}]] [:twitter, 2019-12-11 15:20:06.623981000 UTC, [{"twitter.com", SOA, #<DNS::Packet::Response:0x1039db930 @queries=[#<DNS::Section::Question:0x1039d40c0 @flag=SOA, @query="twitter.com", @cls=IN>], @answers=[#<DNS::Section::Answer:0x1039d4080 @resourceRecord=#<DNS::Record::A:0x1039ddc60 @cls=IN, @ttl=163, @from="twitter.com", @ipv4Address="31.13.84.8">>], @authority=[], @additional=[], @transId=36287, @operationCode=StandardQuery, @responseCode=NoError, @authoritativeAnswer=False, @truncated=False, @recursionDesired=True, @recursionAvailable=True, @authenticatedData=False, @checkingDisabled=False, @questionCount=1, @answerCount=1, @authorityCount=0, @additionalCount=0, @buffer=nil>}]] [:facebook, 2019-12-11 15:20:06.761727000 UTC, [{"facebook.com", A, #<DNS::Packet::Response:0x1039db690 @queries=[#<DNS::Section::Question:0x1039f7d00 @flag=A, @query="facebook.com", @cls=IN>], @answers=[#<DNS::Section::Answer:0x1039f7cc0 @resourceRecord=#<DNS::Record::A:0x1039ddb40 @cls=IN, @ttl=299, @from="facebook.com", @ipv4Address="157.240.196.35">>], @authority=[], @additional=[], @transId=60636, @operationCode=StandardQuery, @responseCode=NoError, @authoritativeAnswer=False, @truncated=False, @recursionDesired=True, @recursionAvailable=True, @authenticatedData=False, @checkingDisabled=False, @questionCount=1, @answerCount=1, @authorityCount=0, @additionalCount=0, @buffer=nil>}, {"facebook.com", AAAA, #<DNS::Packet::Response:0x1039db4d0 @queries=[#<DNS::Section::Question:0x1039f7a80 @flag=AAAA, @query="facebook.com", @cls=IN>], @answers=[#<DNS::Section::Answer:0x1039f7a40 @resourceRecord=#<DNS::Record::A:0x1039ddab0 @cls=IN, @ttl=158, @from="facebook.com", @ipv4Address="8.7.198.45">>], @authority=[], @additional=[], @transId=31058, @operationCode=StandardQuery, @responseCode=NoError, @authoritativeAnswer=False, @truncated=False, @recursionDesired=True, @recursionAvailable=True, @authenticatedData=False, @checkingDisabled=False, @questionCount=1, @answerCount=1, @authorityCount=0, @additionalCount=0, @buffer=nil>}]] β You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <https://github.com/crystal-lang/crystal/issues/8376?email_source=notifications&email_token=AAADBUHP56EQYJCICSLW4P3QYEBHFA5CNFSM4JFBFQU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGTQ34Y#issuecomment-564596211>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAADBUG2UK3GW2QQPNHNELLQYEBHFANCNFSM4JFBFQUQ> .
@rdp
Good suggestion, I have separated the DNS part. Now it is a standalone repository -> 636f7374|durian.cr. OpenSSL (Man-in-the-middle Toolkit) is also separating -> 636f7374|cherry.cr.
Nice!
On Fri, Dec 13, 2019 at 11:48 AM _ notifications@github.com wrote:
Good suggestion, I have separated the DNS part. Now it is a standalone repository -> 636f7374|durian.cr https://github.com/636f7374/durian.cr. OpenSSL (Man-in-the-middle Toolkit) is also separating -> 636f7374|cherry.cr https://github.com/636f7374/cherry.cr.
β You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/crystal-lang/crystal/issues/8376?email_source=notifications&email_token=AAADBUAXMZYHQJCIBF7YN2TQYPKH3A5CNFSM4JFBFQU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEG24TJY#issuecomment-565561767, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAADBUAQ4GGRFLOPRZ2ZI7DQYPKH3ANCNFSM4JFBFQUQ .
Summary
Some TCPSocket connections will cause HTTP::Server accept (freeze | blocking | hangs | waiting)? Maybe I am doing something wrong? Maybe where do I understand wrong?, But is there any solution? I tried adding
spawn
and-Dpreview_mt
withblocking=
, but this doesn't work.In this example, the client creates a TCPSocket every request. In some countries "facebook.com" can't accessed, But (E.g. "taobao.com") can be accessed normally. (E.g. China)
left:
for ((i=1;i<=20000;i++)); do curl -v http://127.0.0.1:8777/?foo=bar; done
: Create many "taobao.com" TCPSocket right:curl -v http://127.0.0.1:8777/
: Create a "facebook.com" TCPSocket When "left" is loop executed, running the "right" command will result in "left" (freeze | blocking | hangs | waiting) for 2 - 3 seconds. Unfortunately, I have not found the cause until now.In addition, Sometimes a DNS timeout will trigger this problem, but I didn't capture the moment that happened.
This problem seems to have some association with "[#]" This problem will always appear with
#<IO::Timeout:connect timed out>
, but I am using the netcatnc -l
test and can't trigger this problem.Some Capture
Not working
spawn
spawn
+same_thread: true
+-Dpreview_mt
blocking = false
Guess (I do not know)
getaddrinfo
?DNS
?Example
first = 0_i32 last = 0_i32 logger = Logger.new STDOUT
channel = Channel(Nil).new
server = HTTP::Server.new do |context| spawn same_thread: true do if context.request.resource != "/" address = "taobao.com" first += 1_i32 else address = "facebook.com" last += 1_i32 end
ensure channel.send nil end
channel.receive end
address = server.bind_tcp 8777 puts "Listening on http://#{address}" server.listen
Verbose
Crystal Version
Call Graph