crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.35k stars 1.62k forks source link

Some TCPSocket connections will cause HTTP::Server accept (freeze | blocking | hangs | waiting)? #8376

Closed 636f7374 closed 4 years ago

636f7374 commented 4 years ago

Summary

Some TCPSocket connections will cause HTTP::Server accept (freeze | blocking | hangs | waiting)? Maybe I am doing something wrong? Maybe where do I understand wrong?, But is there any solution? I tried adding spawn and -Dpreview_mt with blocking=, but this doesn't work.

In this example, the client creates a TCPSocket every request. In some countries "facebook.com" can't accessed, But (E.g. "taobao.com") can be accessed normally. (E.g. China)

left: for ((i=1;i<=20000;i++)); do curl -v http://127.0.0.1:8777/?foo=bar; done : Create many "taobao.com" TCPSocket right: curl -v http://127.0.0.1:8777/ : Create a "facebook.com" TCPSocket When "left" is loop executed, running the "right" command will result in "left" (freeze | blocking | hangs | waiting) for 2 - 3 seconds. Unfortunately, I have not found the cause until now.

In addition, Sometimes a DNS timeout will trigger this problem, but I didn't capture the moment that happened.

This problem seems to have some association with "[#]"

This problem will always appear with #<IO::Timeout:connect timed out>, but I am using the netcat nc -l test and can't trigger this problem.

Some Capture

Not working

Guess (I do not know)

Example

first = 0_i32 last = 0_i32 logger = Logger.new STDOUT

channel = Channel(Nil).new

server = HTTP::Server.new do |context| spawn same_thread: true do if context.request.resource != "/" address = "taobao.com" first += 1_i32 else address = "facebook.com" last += 1_i32 end

logger.info [first, last, address, Time.utc, "Connected"]

begin
  socket = TCPSocket.new address, 80_i32, 10_i32, 10_i32
  socket.blocking = false
  socket.read_timeout = 10_i32.second
  socket.write_timeout = 10_i32.second
rescue ex
  logger.info [ex]
end

socket.try &.close

context.response.content_type = "text/plain"
context.response.print "#{address} | Hello world!\r\n"

ensure channel.send nil end

channel.receive end

address = server.bind_tcp 8777 puts "Listening on http://#{address}" server.listen


* Normal
```crystal
require "http/server"
require "socket"
require "logger"

left = 0_i32
right = 0_i32
logger = Logger.new STDOUT

server = HTTP::Server.new do |context|
  if context.request.resource != "/"
    address = "taobao.com"
    left += 1_i32
  else
    address = "facebook.com"
    right += 1_i32
  end

  logger.info [left, right, address, Time.utc, "Connected"]

  begin
    socket = TCPSocket.new address, 80_i32, 10_i32, 10_i32
    socket.blocking = false
    socket.read_timeout = 10_i32.second
    socket.write_timeout = 10_i32.second
  rescue ex
    logger.info [ex]
  end

  socket.try &.close

  context.response.content_type = "text/plain"
  context.response.print "#{address} | Hello world!\r\n"
end

address = server.bind_tcp 8777
puts "Listening on http://#{address}"
server.listen

Verbose

I, [2019-10-25 17:39:38 +08:00 #32726]  INFO -- : [22, 0, "taobao.com", 2019-10-25 09:39:38.442733000 UTC, "Connected"]
I, [2019-10-25 17:39:38 +08:00 #32726]  INFO -- : [23, 0, "taobao.com", 2019-10-25 09:39:38.505563000 UTC, "Connected"]
I, [2019-10-25 17:39:38 +08:00 #32726]  INFO -- : [24, 0, "taobao.com", 2019-10-25 09:39:38.565499000 UTC, "Connected"]
I, [2019-10-25 17:39:38 +08:00 #32726]  INFO -- : [25, 0, "taobao.com", 2019-10-25 09:39:38.626802000 UTC, "Connected"]
I, [2019-10-25 17:39:38 +08:00 #32726]  INFO -- : [26, 0, "taobao.com", 2019-10-25 09:39:38.690952000 UTC, "Connected"]
I, [2019-10-25 17:39:38 +08:00 #32726]  INFO -- : [27, 0, "taobao.com", 2019-10-25 09:39:38.747874000 UTC, "Connected"]
> I, [2019-10-25 17:39:38 +08:00 #32726]  INFO -- : [27, 1, "facebook.com", 2019-10-25 09:39:38.778297000 UTC, "Connected"]
> I, [2019-10-25 17:39:38 +08:00 #32726]  INFO -- : [28, 1, "taobao.com", 2019-10-25 09:39:38.809867000 UTC, "Connected"]
> I, [2019-10-25 17:39:38 +08:00 #32726]  INFO -- : [29, 1, "taobao.com", 2019-10-25 09:39:38.869907000 UTC, "Connected"]
> I, [2019-10-25 17:39:38 +08:00 #32726]  INFO -- : [30, 1, "taobao.com", 2019-10-25 09:39:38.929369000 UTC, "Connected"]
> I, [2019-10-25 17:39:40 +08:00 #32726]  INFO -- : [31, 1, "taobao.com", 2019-10-25 09:39:40.875634000 UTC, "Connected"]
> I, [2019-10-25 17:39:40 +08:00 #32726]  INFO -- : [32, 1, "taobao.com", 2019-10-25 09:39:40.936123000 UTC, "Connected"]
> I, [2019-10-25 17:39:40 +08:00 #32726]  INFO -- : [33, 1, "taobao.com", 2019-10-25 09:39:40.997590000 UTC, "Connected"]
I, [2019-10-25 17:48:41 +08:00 #35643]  INFO -- : [39, 0, "taobao.com", 2019-10-25 09:48:41.247650000 UTC, "Connected"]
I, [2019-10-25 17:48:41 +08:00 #35643]  INFO -- : [40, 0, "taobao.com", 2019-10-25 09:48:41.304897000 UTC, "Connected"]
I, [2019-10-25 17:48:41 +08:00 #35643]  INFO -- : [41, 0, "taobao.com", 2019-10-25 09:48:41.359026000 UTC, "Connected"]
I, [2019-10-25 17:48:41 +08:00 #35643]  INFO -- : [42, 0, "taobao.com", 2019-10-25 09:48:41.417724000 UTC, "Connected"]
> I, [2019-10-25 17:48:41 +08:00 #35643]  INFO -- : [42, 1, "facebook.com", 2019-10-25 09:48:41.419165000 UTC, "Connected"]
> I, [2019-10-25 17:48:41 +08:00 #35643]  INFO -- : [43, 1, "taobao.com", 2019-10-25 09:48:41.476470000 UTC, "Connected"]
> I, [2019-10-25 17:48:41 +08:00 #35643]  INFO -- : [44, 1, "taobao.com", 2019-10-25 09:48:41.531946000 UTC, "Connected"]
> I, [2019-10-25 17:48:41 +08:00 #35643]  INFO -- : [45, 1, "taobao.com", 2019-10-25 09:48:41.586077000 UTC, "Connected"]
> I, [2019-10-25 17:48:43 +08:00 #35643]  INFO -- : [46, 1, "taobao.com", 2019-10-25 09:48:43.597054000 UTC, "Connected"]
> I, [2019-10-25 17:48:43 +08:00 #35643]  INFO -- : [47, 1, "taobao.com", 2019-10-25 09:48:43.652820000 UTC, "Connected"]
> I, [2019-10-25 17:48:43 +08:00 #35643]  INFO -- : [48, 1, "taobao.com", 2019-10-25 09:48:43.708130000 UTC, "Connected"]
> I, [2019-10-25 17:48:43 +08:00 #35643]  INFO -- : [49, 1, "taobao.com", 2019-10-25 09:48:43.765713000 UTC, "Connected"]
> I, [2019-10-25 17:48:43 +08:00 #35643]  INFO -- : [50, 1, "taobao.com", 2019-10-25 09:48:43.821089000 UTC, "Connected"]
...
I, [2019-10-25 17:48:53 +08:00 #35643]  INFO -- : [#<IO::Timeout:connect timed out>]

Crystal Version

Crystal 0.31.0 (2019-09-24)

LLVM: 8.0.1
Default target: x86_64-apple-macosx

Call Graph

    2652 Thread_179195
    + 2652 ~proc2Proc(Fiber, (IO::FileDescriptor | Nil))@/usr/local/Cellar/crystal/0.31.0_1/src/fiber.cr:48  (in example) + 9  [0x10fac7209]  fiber.cr:48
    +   2652 *Fiber#run:(IO::FileDescriptor | Nil)  (in example) + 112  [0x10fb0d070]  fiber.cr:255
    +     2652 ~procProc(Nil)@/usr/local/Cellar/crystal/0.31.0_1/src/http/server.cr:463  (in example) + 27  [0x10facd94b]  server.cr:463
    +       2652 *HTTP::Server#handle_client<IO+>:Nil  (in example) + 294  [0x10fbde376]  server.cr:497
    +         2652 *HTTP::Server::RequestProcessor#process<IO+, IO+>:Nil  (in example) + 53  [0x10fbde7e5]  request_processor.cr:22
    +           2652 *HTTP::Server::RequestProcessor#process<IO+, IO+, IO::FileDescriptor>:Nil  (in example) + 2151  [0x10fbdf057]  request_processor.cr:255
    +             2652 ~procProc(HTTP::Server::Context, Nil)@example.cr:9  (in example) + 602  [0x10faccb1a]  example.cr:21
    +               2652 *TCPSocket::new<String, Int32, Int32, Int32>:TCPSocket  (in example) + 438  [0x10fbd38e6]  tcp_socket.cr:27
    +                 1420 *TCPSocket#initialize<String, Int32, Int32, Int32>:Nil  (in example) + 5307  [0x10fbd4dbb]  tcp_socket.cr:146
    +                 ! 1420 *Crystal::Scheduler::reschedule:Nil  (in example) + 25  [0x10fb24ed9]  scheduler.cr:43
    +                 !   1420 *Crystal::Scheduler#reschedule:Nil  (in example) + 99  [0x10fb24f43]  scheduler.cr:146
    +                 !     1420 *Crystal::EventLoop::run_once:Int32  (in example) + 34  [0x10fb23bb2]  event_loop.cr:16
    +                 !       1420 *Crystal::Event::Base#run_once:Int32  (in example) + 17  [0x10fb24551]  event.cr:62
    +                 !         1420 event_base_loop  (in libevent-2.1.7.dylib) + 699  [0x11025d343]
    +                 !           1420 kq_dispatch  (in libevent-2.1.7.dylib) + 559  [0x1102665ba]
    +                 !             1420 kevent  (in libsystem_kernel.dylib) + 10  [0x7fffdd72dd96]
    +                 1232 *TCPSocket#initialize<String, Int32, Int32, Int32>:Nil  (in example) + 1000  [0x10fbd3ce8]  tcp_socket.cr:120
    +                   1232 getaddrinfo  (in libsystem_info.dylib) + 185  [0x7fffdd6edad7]
    +                     1231 si_addrinfo  (in libsystem_info.dylib) + 1945  [0x7fffdd6ee317]
    +                     : 1231 search_addrinfo  (in libsystem_info.dylib) + 179  [0x7fffdd6f2128]
    +                     :   1231 mdns_addrinfo  (in libsystem_info.dylib) + 631  [0x7fffdd6f23e5]
    +                     :     1231 _mdns_search  (in libsystem_info.dylib) + 1294  [0x7fffdd6f0737]
    +                     :       1231 kevent  (in libsystem_kernel.dylib) + 10  [0x7fffdd72dd96]
    +                     1 si_addrinfo  (in libsystem_info.dylib) + 3027  [0x7fffdd6ee751]
    +                       1 nw_path_create_evaluator_for_endpoint  (in libsystem_network.dylib) + 1233  [0x7fffdd7cc3b4]
    +                         1 nw_path_snapshot_path  (in libsystem_network.dylib) + 164  [0x7fffdd7cc583]
    +                           1 nw_path_copy_interface_with_generation  (in libsystem_network.dylib) + 287  [0x7fffdd7ccedb]
    +                             1 nwlog_is_debug_logging_enabled  (in libsystem_network.dylib) + 23  [0x7fffdd7ad266]
    +                               1 pthread_once  (in libsystem_pthread.dylib) + 57  [0x7fffdd817b2b]
    +                                 1 _os_once  (in libsystem_platform.dylib) + 36  [0x7fffdd80cac1]
    +                                   1 __pthread_once_handler  (in libsystem_pthread.dylib) + 65  [0x7fffdd817b8c]
    +                                     1 nwlog_legacy_init_once  (in libsystem_network.dylib) + 37  [0x7fffdd7acea5]
    +                                       1 os_log_create  (in libsystem_trace.dylib) + 711  [0x7fffdd839782]
    +                                         1 dispatch_once_f  (in libdispatch.dylib) + 38  [0x7fffdd5c88b9]
    +                                           1 _dispatch_client_callout  (in libdispatch.dylib) + 8  [0x7fffdd5c88fc]
    +                                             1 _os_trace_init_slow  (in libsystem_trace.dylib) + 622  [0x7fffdd830cf2]
    +                                               1 sysctl  (in libsystem_c.dylib) + 171  [0x7fffdd66288f]
    +                                                 1 __sysctl  (in libsystem_kernel.dylib) + 10  [0x7fffdd72d28e]
    2624 Thread_181279   DispatchQueue_1: com.apple.main-thread  (serial)
    + 2585 ~proc2Proc(Fiber, (IO::FileDescriptor | Nil))@/usr/local/Cellar/crystal/0.31.0_1/src/fiber.cr:48  (in example) + 9  [0x10d17b209]  fiber.cr:48
    + ! 1691 *Fiber#run:(IO::FileDescriptor | Nil)  (in example) + 748  [0x10d1c12ec]  fiber.cr:98
    + ! : 1691 *Crystal::Scheduler::reschedule:Nil  (in example) + 25  [0x10d1d8ed9]  scheduler.cr:43
    + ! :   1691 *Crystal::Scheduler#reschedule:Nil  (in example) + 99  [0x10d1d8f43]  scheduler.cr:146
    + ! :     1691 *Crystal::EventLoop::run_once:Int32  (in example) + 34  [0x10d1d7bb2]  event_loop.cr:16
    + ! :       1691 *Crystal::Event::Base#run_once:Int32  (in example) + 17  [0x10d1d8551]  event.cr:62
    + ! :         1691 event_base_loop  (in libevent-2.1.7.dylib) + 699  [0x10d918343]
    + ! :           1691 kq_dispatch  (in libevent-2.1.7.dylib) + 559  [0x10d9215ba]
    + ! :             1691 kevent  (in libsystem_kernel.dylib) + 10  [0x7fffdd72dd96]
    + ! 894 *Fiber#run:(IO::FileDescriptor | Nil)  (in example) + 112  [0x10d1c1070]  fiber.cr:255
    + !   894 ~procProc(Nil)@/usr/local/Cellar/crystal/0.31.0_1/src/http/server.cr:463  (in example) + 27  [0x10d18194b]  server.cr:463
    + !     894 *HTTP::Server#handle_client<IO+>:Nil  (in example) + 294  [0x10d292376]  server.cr:497
    + !       894 *HTTP::Server::RequestProcessor#process<IO+, IO+>:Nil  (in example) + 53  [0x10d2927e5]  request_processor.cr:22
    + !         894 *HTTP::Server::RequestProcessor#process<IO+, IO+, IO::FileDescriptor>:Nil  (in example) + 2151  [0x10d293057]  request_processor.cr:255
    + !           893 ~procProc(HTTP::Server::Context, Nil)@example.cr:9  (in example) + 602  [0x10d180b1a]  example.cr:21
    + !           | 893 *TCPSocket::new<String, Int32, Int32, Int32>:TCPSocket  (in example) + 438  [0x10d2878e6]  tcp_socket.cr:27
    + !           |   893 *TCPSocket#initialize<String, Int32, Int32, Int32>:Nil  (in example) + 1000  [0x10d287ce8]  tcp_socket.cr:120
    + !           |     893 getaddrinfo  (in libsystem_info.dylib) + 185  [0x7fffdd6edad7]
    + !           |       893 si_addrinfo  (in libsystem_info.dylib) + 1945  [0x7fffdd6ee317]
    + !           |         893 search_addrinfo  (in libsystem_info.dylib) + 179  [0x7fffdd6f2128]
    + !           |           893 mdns_addrinfo  (in libsystem_info.dylib) + 631  [0x7fffdd6f23e5]
    + !           |             893 _mdns_search  (in libsystem_info.dylib) + 1294  [0x7fffdd6f0737]
    + !           |               893 kevent  (in libsystem_kernel.dylib) + 10  [0x7fffdd72dd96]
    + !           1 ~procProc(HTTP::Server::Context, Nil)@example.cr:9  (in example) + 257  [0x10d1809c1]  example.cr:152
    + !             1 *Array(Int32 | String | Time)@Array(T)::new<Int32>:Array(Int32 | String | Time)  (in example) + 30  [0x10d282f3e]  array.cr:81
    + !               1 __crystal_malloc64  (in example) + 17  [0x10d168401]  gc.cr:24
    + !                 1 *GC::malloc<UInt64>:Pointer(Void)  (in example) + 9  [0x10d1a74e9]  boehm.cr:84
    + !                   1 GC_malloc_kind_global  (in example) + 56  [0x10d2be9c7]
    + !                     1 pthread_mutex_lock  (in libsystem_pthread.dylib) + 85  [0x7fffdd8153e1]
    + 39 ~procProc(Fiber, (IO::FileDescriptor | Nil))@/usr/local/Cellar/crystal/0.31.0_1/src/fiber.cr:48  (in example) + 9  [0x10d17af29]  fiber.cr:48
    +   39 *Fiber#run:(IO::FileDescriptor | Nil)  (in example) + 94  [0x10d1c105e]  fiber.cr:255
    +     39 ~procProc(Nil)@/usr/local/Cellar/crystal/0.31.0_1/src/kernel.cr:542  (in example) + 24  [0x10d17ae08]  kernel.cr:542
    +       39 *sleep<Int32>:Nil  (in example) + 66  [0x10d17ae62]  concurrent.cr:15
    +         39 *Crystal::Scheduler::sleep<Time::Span>:Nil  (in example) + 57  [0x10d1d97f9]  scheduler.cr:51
    +           39 *Crystal::Scheduler#sleep<Time::Span>:Nil  (in example) + 84  [0x10d1d9854]  scheduler.cr:157
    +             39 *Crystal::Scheduler#reschedule:Nil  (in example) + 99  [0x10d1d8f43]  scheduler.cr:146
    +               39 *Crystal::EventLoop::run_once:Int32  (in example) + 34  [0x10d1d7bb2]  event_loop.cr:16
    +                 39 *Crystal::Event::Base#run_once:Int32  (in example) + 17  [0x10d1d8551]  event.cr:62
    +                   39 event_base_loop  (in libevent-2.1.7.dylib) + 699  [0x10d918343]
    +                     39 kq_dispatch  (in libevent-2.1.7.dylib) + 559  [0x10d9215ba]
    +                       39 kevent  (in libsystem_kernel.dylib) + 10  [0x7fffdd72dd96]
straight-shoota commented 4 years ago

@636f7374 What was the problem? Do you care to share?

636f7374 commented 4 years ago

@straight-shoota Oops, the problem still exists. ... Sorry, I replied a bit late.

636f7374 commented 4 years ago

@straight-shoota This problem is tricky, it hinders my reverse proxy server. I am looking for this problem, it may take some time. If there is new progress, I will send it here. πŸ™‚

636f7374 commented 4 years ago

Update

NSlookup

iMac: User$ nslookup facebook.com
Server:     192.168.1.1
Address:    192.168.1.1#53

Non-authoritative answer:
Name:   facebook.com
Address: 67.228.74.123

Dig

iMac: User$ dig facebook.com

; <<>> DiG 9.8.3-P1 <<>> facebook.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 16053
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;facebook.com.          IN  A

;; ANSWER SECTION:
facebook.com.       79  IN  A   69.171.240.27

;; Query time: 1243 msec
;; SERVER: 192.168.1.1#53(192.168.1.1)
;; WHEN: Sat Oct 26 22:55:59 2019
;; MSG SIZE  rcvd: 46
bcardiff commented 4 years ago

@636f7374 might you be reaching the http://danielmendel.github.io/blog/2013/04/07/benchmarkers-beware-the-ephemeral-port-limit/ ?

636f7374 commented 4 years ago

@bcardiff Okay, Yes, I tried ulimit -u 1024 before (macOS default: 256). lsof results have been changing. In addition, this problem will also occur when the number of requests is small. πŸ™‚ (I.e. Run the right command once, the server will (freeze | blocking | hangs | waiting), no matter how many requests.)

Lsof (results have been changing)

iMac:~ User$ lsof -p 11290
COMMAND   PID USER   FD      TYPE             DEVICE  SIZE/OFF     NODE NAME
threads 11290 User  cwd       DIR                1,3       170 10262722 /Users/User/Downloads/threads
threads 11290 User  txt       REG                1,3   3090564 10292039 /Users/User/Downloads/threads/threads
threads 11290 User  txt       REG                1,3    381200  6765855 /usr/local/Cellar/openssl/1.0.2s/lib/libssl.1.0.0.dylib
threads 11290 User  txt       REG                1,3   2003932  6765852 /usr/local/Cellar/openssl/1.0.2s/lib/libcrypto.1.0.0.dylib
threads 11290 User  txt       REG                1,3    416976   422220 /usr/lib/libpcre.0.dylib
threads 11290 User  txt       REG                1,3     14352  9630926 /usr/local/Cellar/libevent/2.1.11_1/lib/libevent_pthreads-2.1.7.dylib
threads 11290 User  txt       REG                1,3    248404  9630914 /usr/local/Cellar/libevent/2.1.11_1/lib/libevent-2.1.7.dylib
threads 11290 User  txt       REG                1,3    698896   422030 /usr/lib/dyld
threads 11290 User  txt       REG                1,3 662274048  9591240 /private/var/db/dyld/dyld_shared_cache_x86_64h
threads 11290 User    0u      CHR               16,0    0t7748     1117 /dev/ttys000
threads 11290 User    1u      CHR               16,0    0t7748     1117 /dev/ttys000
threads 11290 User    2u      CHR               16,0    0t7748     1117 /dev/ttys000
threads 11290 User    3      PIPE 0xa1e6fa57b499b07d     16384          ->0xa1e6fa57b499c33d
threads 11290 User    4      PIPE 0xa1e6fa57b499c33d     16384          ->0xa1e6fa57b499b07d
threads 11290 User    5r      CHR               14,1      0t32      574 /dev/urandom
threads 11290 User    6u      CHR               16,0    0t4002     1117 /dev/ttys000
threads 11290 User    7u      CHR               16,0       0t0     1117 /dev/ttys000
threads 11290 User    8      PIPE 0xa1e6fa57b499c3fd     16384          ->0xa1e6fa57b499c4bd
threads 11290 User    9      PIPE 0xa1e6fa57b499c4bd     16384          ->0xa1e6fa57b499c3fd
threads 11290 User   10      PIPE 0xa1e6fa57b499cdbd     16384          ->0xa1e6fa57b499a53d
threads 11290 User   11      PIPE 0xa1e6fa57b499a53d     16384          ->0xa1e6fa57b499cdbd
threads 11290 User   12u   KQUEUE                                       count=2, state=0x8
threads 11290 User   13      PIPE 0xa1e6fa57b499a6bd     16384          ->0xa1e6fa57b499ccfd
threads 11290 User   14      PIPE 0xa1e6fa57b499ccfd     16384          ->0xa1e6fa57b499a6bd
threads 11290 User   15u   KQUEUE                                       count=1, state=0x8
threads 11290 User   16u   KQUEUE                                       count=0, state=0xa
threads 11290 User   17      PIPE 0xa1e6fa57b499cf3d     16384          ->0xa1e6fa57b499b1fd
threads 11290 User   18      PIPE 0xa1e6fa57b499b1fd     16384          ->0xa1e6fa57b499cf3d
threads 11290 User   19u   KQUEUE                                       count=1, state=0x8
threads 11290 User   20u     IPv4 0xa1e6fa57c32d7cf5       0t0      TCP localhost:8777 (LISTEN)
threads 11290 User   21u     IPv4 0xa1e6fa57c47dbee5       0t0      TCP 192.168.1.100:55086->31.13.75.18:http (SYN_SENT)
threads 11290 User   22   NPOLICY                                       
threads 11290 User   23u     unix 0xa1e6fa57bdc029a5       0t0          ->0xa1e6fa57bdc01eb5
threads 11290 User   24u    systm                          0t0          
threads 11290 User   25u     IPv4 0xa1e6fa57c481cb05       0t0      TCP localhost:8777->localhost:55047 (CLOSE_WAIT)
threads 11290 User   26u     IPv4 0xa1e6fa57bd5ff20d       0t0      TCP localhost:8777->localhost:55030 (CLOSE_WAIT)
threads 11290 User   28u     IPv4 0xa1e6fa57c481d3fd       0t0      TCP localhost:8777->localhost:55053 (CLOSE_WAIT)
threads 11290 User   29u     IPv4 0xa1e6fa57c31bbb05       0t0      TCP 192.168.1.100:55126->31.13.75.18:http (SYN_SENT)
threads 11290 User   30u     IPv4 0xa1e6fa57c476db05       0t0      TCP localhost:8777->localhost:55063 (CLOSE_WAIT)
threads 11290 User   31u     IPv4 0xa1e6fa57bc7b7915       0t0      TCP localhost:8777->localhost:55069 (CLOSE_WAIT)
threads 11290 User   32u     IPv4 0xa1e6fa57c485c5ed       0t0      TCP localhost:8777->localhost:55075 (CLOSE_WAIT)
threads 11290 User   33u     IPv4 0xa1e6fa57c47ca915       0t0      TCP localhost:8777->localhost:55081 (CLOSE_WAIT)
threads 11290 User   34u     IPv4 0xa1e6fa57c31ddcf5       0t0      TCP 192.168.1.100:55127->31.13.75.18:http (SYN_SENT)
threads 11290 User   35u     IPv4 0xa1e6fa57c48905ed       0t0      TCP 192.168.1.100:55128->31.13.75.18:http (SYN_SENT)
threads 11290 User   37u     IPv4 0xa1e6fa57c47d920d       0t0      TCP localhost:8777->localhost:55094 (CLOSE_WAIT)
threads 11290 User   38u     IPv4 0xa1e6fa57c47da3fd       0t0      TCP localhost:8777->localhost:55100 (CLOSE_WAIT)
threads 11290 User   39u     IPv4 0xa1e6fa57c31895ed       0t0      TCP localhost:8777->localhost:55106 (CLOSE_WAIT)
threads 11290 User   40u     IPv4 0xa1e6fa57c48b8915       0t0      TCP localhost:8777->localhost:55115 (CLOSE_WAIT)
threads 11290 User   41u     IPv4 0xa1e6fa57c31b95ed       0t0      TCP localhost:8777->localhost:55121 (CLOSE_WAIT)
threads 11290 User   42u   KQUEUE                                       count=0, state=0xa
threads 11290 User   44u   KQUEUE                                       count=0, state=0xa
threads 11290 User   45u   KQUEUE                                       count=0, state=0xa
636f7374 commented 4 years ago

Update

I took a closer look.

Seem Appears when [#] exception is triggered (freeze | blocking | hangs | waiting). It seems that the problem cannot be triggered by [#].

Sometimes this problem will occur many times ([#]), sometimes this problem will only appear once ([#]).

Maybe I should record a screen video, then convert it to GIF, send it here.

But it seems that "[#]" and "[#]" appear randomly, sometimes "[#]" does not appear once in a long time.

I should find some way to let him constantly trigger the "[#]" problem.

This problem will always appear with #<IO::Timeout:connect timed out>, but I am using the netcat nc -l test and can't trigger this problem.

I, [2019-10-26 22:13:54 +08:00 #5335]  INFO -- : [0, 1, "facebook.com", 2019-10-26 14:13:54.516772000 UTC, "Connected"]
I, [2019-10-26 22:13:59 +08:00 #5335]  INFO -- : [1, 1, "taobao.com", 2019-10-26 14:13:59.634948000 UTC, "Connected"]
I, [2019-10-26 22:13:59 +08:00 #5335]  INFO -- : [1, 2, "facebook.com", 2019-10-26 14:13:59.669194000 UTC, "Connected"]
I, [2019-10-26 22:14:04 +08:00 #5335]  INFO -- : [1, 3, "facebook.com", 2019-10-26 14:14:04.676075000 UTC, "Connected"]
I, [2019-10-26 22:14:09 +08:00 #5335]  INFO -- : [1, 4, "facebook.com", 2019-10-26 14:14:09.682888000 UTC, "Connected"]
I, [2019-10-26 22:14:14 +08:00 #5335]  INFO -- : [1, 5, "facebook.com", 2019-10-26 14:14:14.691049000 UTC, "Connected"]
I, [2019-10-26 22:14:19 +08:00 #5335]  INFO -- : [1, 6, "facebook.com", 2019-10-26 14:14:19.697024000 UTC, "Connected"]
I, [2019-10-26 22:14:24 +08:00 #5335]  INFO -- : [#<IO::Timeout:connect timed out>]
I, [2019-10-26 22:14:24 +08:00 #5335]  INFO -- : [#<IO::Timeout:connect timed out>]
I, [2019-10-26 22:14:24 +08:00 #5335]  INFO -- : [#<IO::Timeout:connect timed out>]
I, [2019-10-26 22:14:24 +08:00 #5335]  INFO -- : [#<IO::Timeout:connect timed out>]
I, [2019-10-26 22:14:24 +08:00 #5335]  INFO -- : [1, 7, "facebook.com", 2019-10-26 14:14:24.702985000 UTC, "Connected"]
I, [2019-10-26 22:14:29 +08:00 #5335]  INFO -- : [2, 7, "taobao.com", 2019-10-26 14:14:29.706984000 UTC, "Connected"]
I, [2019-10-26 22:14:29 +08:00 #5335]  INFO -- : [2, 8, "facebook.com", 2019-10-26 14:14:29.708446000 UTC, "Connected"]
I, [2019-10-26 22:14:34 +08:00 #5335]  INFO -- : [2, 9, "facebook.com", 2019-10-26 14:14:34.716301000 UTC, "Connected"]
I, [2019-10-26 22:14:39 +08:00 #5335]  INFO -- : [#<IO::Timeout:connect timed out>]
I, [2019-10-26 22:14:39 +08:00 #5335]  INFO -- : [#<IO::Timeout:connect timed out>]
I, [2019-10-26 22:14:39 +08:00 #5335]  INFO -- : [#<IO::Timeout:connect timed out>]
I, [2019-10-26 22:14:39 +08:00 #5335]  INFO -- : [#<IO::Timeout:connect timed out>]
I, [2019-10-26 22:14:39 +08:00 #5335]  INFO -- : [3, 9, "taobao.com", 2019-10-26 14:14:39.780822000 UTC, "Connected"]
I, [2019-10-26 22:14:39 +08:00 #5335]  INFO -- : [4, 9, "taobao.com", 2019-10-26 14:14:39.837072000 UTC, "Connected"]
I, [2019-10-26 22:14:39 +08:00 #5335]  INFO -- : [5, 9, "taobao.com", 2019-10-26 14:14:39.892445000 UTC, "Connected"]
I, [2019-10-26 22:14:39 +08:00 #5335]  INFO -- : [6, 9, "taobao.com", 2019-10-26 14:14:39.947160000 UTC, "Connected"]
I, [2019-10-26 22:14:40 +08:00 #5335]  INFO -- : [7, 9, "taobao.com", 2019-10-26 14:14:40.005131000 UTC, "Connected"]
I, [2019-10-26 22:14:40 +08:00 #5335]  INFO -- : [8, 9, "taobao.com", 2019-10-26 14:14:40.059551000 UTC, "Connected"]
I, [2019-10-26 22:14:40 +08:00 #5335]  INFO -- : [9, 9, "taobao.com", 2019-10-26 14:14:40.120486000 UTC, "Connected"]
I, [2019-10-26 22:14:40 +08:00 #5335]  INFO -- : [10, 9, "taobao.com", 2019-10-26 14:14:40.176422000 UTC, "Connected"]
I, [2019-10-26 22:14:40 +08:00 #5335]  INFO -- : [11, 9, "taobao.com", 2019-10-26 14:14:40.234508000 UTC, "Connected"]
I, [2019-10-26 22:14:40 +08:00 #5335]  INFO -- : [12, 9, "taobao.com", 2019-10-26 14:14:40.290436000 UTC, "Connected"]
I, [2019-10-26 22:14:40 +08:00 #5335]  INFO -- : [13, 9, "taobao.com", 2019-10-26 14:14:40.345031000 UTC, "Connected"]
I, [2019-10-26 22:14:40 +08:00 #5335]  INFO -- : [14, 9, "taobao.com", 2019-10-26 14:14:40.399339000 UTC, "Connected"]
I, [2019-10-26 22:14:40 +08:00 #5335]  INFO -- : [15, 9, "taobao.com", 2019-10-26 14:14:40.457172000 UTC, "Connected"]
I, [2019-10-26 22:14:40 +08:00 #5335]  INFO -- : [16, 9, "taobao.com", 2019-10-26 14:14:40.512656000 UTC, "Connected"]
I, [2019-10-26 22:14:40 +08:00 #5335]  INFO -- : [17, 9, "taobao.com", 2019-10-26 14:14:40.565735000 UTC, "Connected"]
I, [2019-10-26 22:14:40 +08:00 #5335]  INFO -- : [18, 9, "taobao.com", 2019-10-26 14:14:40.619120000 UTC, "Connected"]
I, [2019-10-26 22:14:40 +08:00 #5335]  INFO -- : [19, 9, "taobao.com", 2019-10-26 14:14:40.675409000 UTC, "Connected"]
I, [2019-10-26 22:14:40 +08:00 #5335]  INFO -- : [20, 9, "taobao.com", 2019-10-26 14:14:40.732372000 UTC, "Connected"]
I, [2019-10-26 22:14:40 +08:00 #5335]  INFO -- : [21, 9, "taobao.com", 2019-10-26 14:14:40.791124000 UTC, "Connected"]
I, [2019-10-26 22:14:40 +08:00 #5335]  INFO -- : [22, 9, "taobao.com", 2019-10-26 14:14:40.845640000 UTC, "Connected"]
I, [2019-10-26 23:42:42 +08:00 #11290]  INFO -- : [23, 4, "facebook.com", 2019-10-26 15:42:42.416706000 UTC, "Connected"]
I, [2019-10-26 23:42:42 +08:00 #11290]  INFO -- : [24, 4, "taobao.com", 2019-10-26 15:42:42.416849000 UTC, "Connected"]
I, [2019-10-26 23:42:42 +08:00 #11290]  INFO -- : [24, 5, "facebook.com", 2019-10-26 15:42:42.416978000 UTC, "Connected"]
I, [2019-10-26 23:42:42 +08:00 #11290]  INFO -- : [24, 6, "facebook.com", 2019-10-26 15:42:42.417674000 UTC, "Connected"]
I, [2019-10-26 23:42:42 +08:00 #11290]  INFO -- : [24, 7, "facebook.com", 2019-10-26 15:42:42.417750000 UTC, "Connected"]
I, [2019-10-26 23:42:47 +08:00 #11290]  INFO -- : [25, 7, "taobao.com", 2019-10-26 15:42:47.428807000 UTC, "Connected"]
I, [2019-10-26 23:42:47 +08:00 #11290]  INFO -- : [25, 8, "facebook.com", 2019-10-26 15:42:47.428838000 UTC, "Connected"]
I, [2019-10-26 23:42:47 +08:00 #11290]  INFO -- : [25, 9, "facebook.com", 2019-10-26 15:42:47.429027000 UTC, "Connected"]
I, [2019-10-26 23:42:47 +08:00 #11290]  INFO -- : [25, 10, "facebook.com", 2019-10-26 15:42:47.432212000 UTC, "Connected"]
I, [2019-10-26 23:42:48 +08:00 #11290]  INFO -- : [#<IO::Timeout:connect timed out>]
I, [2019-10-26 23:42:52 +08:00 #11290]  INFO -- : [25, 11, "facebook.com", 2019-10-26 15:42:52.438132000 UTC, "Connected"]
I, [2019-10-26 23:42:52 +08:00 #11290]  INFO -- : [25, 12, "facebook.com", 2019-10-26 15:42:52.439745000 UTC, "Connected"]
I, [2019-10-26 23:42:57 +08:00 #11290]  INFO -- : [#<IO::Timeout:connect timed out>]
I, [2019-10-26 23:42:57 +08:00 #11290]  INFO -- : [25, 13, "facebook.com", 2019-10-26 15:42:57.444313000 UTC, "Connected"]
I, [2019-10-26 23:42:57 +08:00 #11290]  INFO -- : [#<IO::Timeout:connect timed out>]
I, [2019-10-26 23:42:57 +08:00 #11290]  INFO -- : [26, 13, "taobao.com", 2019-10-26 15:42:57.444448000 UTC, "Connected"]
I, [2019-10-26 23:42:57 +08:00 #11290]  INFO -- : [#<IO::Timeout:connect timed out>]
I, [2019-10-26 23:42:57 +08:00 #11290]  INFO -- : [26, 14, "facebook.com", 2019-10-26 15:42:57.444492000 UTC, "Connected"]
I, [2019-10-26 23:42:57 +08:00 #11290]  INFO -- : [26, 15, "facebook.com", 2019-10-26 15:42:57.444637000 UTC, "Connected"]
I, [2019-10-26 23:42:57 +08:00 #11290]  INFO -- : [26, 16, "facebook.com", 2019-10-26 15:42:57.444664000 UTC, "Connected"]
I, [2019-10-26 23:42:57 +08:00 #11290]  INFO -- : [26, 17, "facebook.com", 2019-10-26 15:42:57.444684000 UTC, "Connected"]
I, [2019-10-26 23:42:57 +08:00 #11290]  INFO -- : [#<IO::Timeout:connect timed out>]
I, [2019-10-26 23:42:58 +08:00 #11290]  INFO -- : [26, 18, "facebook.com", 2019-10-26 15:42:58.375205000 UTC, "Connected"]
I, [2019-10-26 23:43:01 +08:00 #11290]  INFO -- : [26, 19, "facebook.com", 2019-10-26 15:43:01.408139000 UTC, "Connected"]
I, [2019-10-26 23:43:02 +08:00 #11290]  INFO -- : [26, 20, "facebook.com", 2019-10-26 15:43:02.414938000 UTC, "Connected"]
I, [2019-10-26 23:43:02 +08:00 #11290]  INFO -- : [#<IO::Timeout:connect timed out>]
I, [2019-10-26 23:43:02 +08:00 #11290]  INFO -- : [26, 21, "facebook.com", 2019-10-26 15:43:02.447648000 UTC, "Connected"]
I, [2019-10-26 23:43:02 +08:00 #11290]  INFO -- : [26, 22, "facebook.com", 2019-10-26 15:43:02.447684000 UTC, "Connected"]
I, [2019-10-26 23:43:02 +08:00 #11290]  INFO -- : [26, 23, "facebook.com", 2019-10-26 15:43:02.447711000 UTC, "Connected"]
I, [2019-10-26 23:43:02 +08:00 #11290]  INFO -- : [26, 24, "facebook.com", 2019-10-26 15:43:02.447744000 UTC, "Connected"]
I, [2019-10-26 23:43:02 +08:00 #11290]  INFO -- : [#<IO::Timeout:connect timed out>]
I, [2019-10-26 23:43:02 +08:00 #11290]  INFO -- : [26, 25, "facebook.com", 2019-10-26 15:43:02.449457000 UTC, "Connected"]
I, [2019-10-26 23:43:02 +08:00 #11290]  INFO -- : [26, 26, "facebook.com", 2019-10-26 15:43:02.449687000 UTC, "Connected"]
I, [2019-10-26 23:43:02 +08:00 #11290]  INFO -- : [26, 27, "facebook.com", 2019-10-26 15:43:02.451694000 UTC, "Connected"]
I, [2019-10-26 23:43:02 +08:00 #11290]  INFO -- : [#<IO::Timeout:connect timed out>]
I, [2019-10-26 23:43:02 +08:00 #11290]  INFO -- : [27, 27, "taobao.com", 2019-10-26 15:43:02.504360000 UTC, "Connected"]
I, [2019-10-26 23:43:03 +08:00 #11290]  INFO -- : [27, 28, "facebook.com", 2019-10-26 15:43:03.426798000 UTC, "Connected"]
I, [2019-10-26 23:43:05 +08:00 #11290]  INFO -- : [27, 29, "facebook.com", 2019-10-26 15:43:05.444144000 UTC, "Connected"]
I, [2019-10-26 23:43:06 +08:00 #11290]  INFO -- : [27, 30, "facebook.com", 2019-10-26 15:43:06.455087000 UTC, "Connected"]
I, [2019-10-26 23:43:07 +08:00 #11290]  INFO -- : [#<IO::Timeout:connect timed out>]
I, [2019-10-26 23:43:12 +08:00 #11290]  INFO -- : [#<IO::Timeout:connect timed out>]
I, [2019-10-26 23:43:12 +08:00 #11290]  INFO -- : [#<IO::Timeout:connect timed out>]
I, [2019-10-26 23:43:12 +08:00 #11290]  INFO -- : [27, 31, "facebook.com", 2019-10-26 15:43:12.461889000 UTC, "Connected"]
I, [2019-10-26 23:43:12 +08:00 #11290]  INFO -- : [27, 33, "facebook.com", 2019-10-26 15:43:12.462316000 UTC, "Connected"]
I, [2019-10-26 23:43:12 +08:00 #11290]  INFO -- : [27, 34, "facebook.com", 2019-10-26 15:43:12.462407000 UTC, "Connected"]
I, [2019-10-26 23:43:12 +08:00 #11290]  INFO -- : [27, 34, "facebook.com", 2019-10-26 15:43:12.462488000 UTC, "Connected"]
I, [2019-10-26 23:43:12 +08:00 #11290]  INFO -- : [27, 35, "facebook.com", 2019-10-26 15:43:12.462661000 UTC, "Connected"]
I, [2019-10-26 23:43:17 +08:00 #11290]  INFO -- : [27, 36, "facebook.com", 2019-10-26 15:43:17.472324000 UTC, "Connected"]
I, [2019-10-26 23:43:17 +08:00 #11290]  INFO -- : [27, 37, "facebook.com", 2019-10-26 15:43:17.472625000 UTC, "Connected"]
I, [2019-10-26 23:43:17 +08:00 #11290]  INFO -- : [27, 39, "facebook.com", 2019-10-26 15:43:17.472784000 UTC, "Connected"]
I, [2019-10-26 23:43:17 +08:00 #11290]  INFO -- : [27, 39, "facebook.com", 2019-10-26 15:43:17.472829000 UTC, "Connected"]
I, [2019-10-26 23:43:17 +08:00 #11290]  INFO -- : [27, 40, "facebook.com", 2019-10-26 15:43:17.473038000 UTC, "Connected"]
636f7374 commented 4 years ago

Update

I added a video GIF to this problem.

I need to temporarily put aside this problem, I have to deal with other things, waiting for the next problem to reappear.

rdp commented 4 years ago

At first blush appears to be DNS lookup "hanging" in the kernel, perhaps? Do you get the same problem when running in linux?

636f7374 commented 4 years ago

@rdp No, I haven't tested this problem in Linux yet, I will test it later and add more GIF videos (I will test this problem as soon as possible, need some time πŸ™‚).

rdp commented 4 years ago

Is the HTTP server doing a DNS lookup on incoming requests? Hmm. Hanging at all is of course bad though, even if avoided in this particular situation. Here's about the only thing I could see similar out there: https://forum.photonengine.com/discussion/2801/photon-hang-when-connecting-under-bad-internet-connection What version of OS X? Are the call graphs in the initial description from two different runs? Does it actually hang or just timeout after 3s?

rdp commented 4 years ago

Also maybe in linux it could use https://stackoverflow.com/questions/32619132/getaddrinfo-and-gethostbyname-hangs-the-system to avoid hanging a thread? hmm... (unrelated) :)

636f7374 commented 4 years ago

@rdp


Here's about the only thing I could see similar out there: https://forum.photonengine.com/discussion/2801/photon-hang-when-connecting-under-bad-internet-connection

Yes, I also found this URL, But I found another #2745 (Says fixes the problem getaddrinfo).


What version of OS X?

$ uname -a
Darwin iMac 16.7.0 Darwin Kernel Version 16.7.0: Thu Jun 15 17:36:27 PDT 2017; root:xnu-3789.70.16~2/RELEASE_X86_64 x86_64
macOS Sierra 10.12.6

Are the call graphs in the initial description from two different runs?

Yes, I click "sampling" when it is (freeze | blocking | hangs | waiting). These "sampling" come from different runs.

Does it actually hang or just timeout after 3s?

He will get stuck (slow processing), "exception" will be triggered after timeout.

I recorded another GIF video (macOS), you can take a look. (Please let the GIF file fully load before watching β‰ˆ 6 MegaBytes).

Capture


636f7374 commented 4 years ago

@rdp

Also maybe in linux it could use https://stackoverflow.com/questions/32619132/getaddrinfo-and-gethostbyname-hangs-the-system to avoid hanging a thread? hmm... (unrelated) :)

Yes, I have seen this before, then I found #2745 again. (Sorry, I am not a C developer, so I can't solve this problem :( (can't provide much help, do my best :) ) )

(In addition, I don't know if Rust or Golang will have this problem. (Can you give me some code? Let me test it)) (I will go to test linux (ubuntu) first.)

636f7374 commented 4 years ago

@rdp

ysbaddaden: Asynchronous DNS resolver for Crystal (call getaddrinfo in thread + socket pair communication)

# It consists on starting a thread that will run the blocking `getaddrinfo`
# syscall, thus allowing the main thread β€”the one running the event loopβ€” to not
# stop while resolving domains, which can take from milliseconds to seconds.
# Communication with the thread is achieved through a socket pair, one for the
# main thread, and one for the resolver thread. Since we rely on socket, the
# event-loop will pause or resume fibers accordingly to messages sent or
# received from the socket.
rdp commented 4 years ago

Couple questions, after looking more closely. Maybe requesting facebook.com somehow bworks your DNS server? What if you specify a different DNS server (for your local connection), same result? What if you use a ruby server instead of crystal, same result? What if you have two (crystal) servers, on different ports, and hit them same type of order, same result? What if you run a simple C program that calls getaddrinfo, when the hangs start happening, does it hang? ex: http://www.logix.cz/michal/devel/various/getaddrinfo.c ? What if you kill mDNS process do things start working again?

636f7374 commented 4 years ago

@rdp


Maybe requesting facebook.com somehow bworks your DNS server?

(Facebook, Google, Twitter, Youtube, ...) will trigger this problem. (Any prohibited website will be like this (E.g. in China) ). Only facebook.com is the most obvious (the longest time is stuck).


What if you use a ruby server instead of crystal, same result?

Sorry, I don't know how to implement this in Ruby. Can I provide some Ruby code? πŸ˜“ (When I was learning Ruby halfway, I discovered the Crystal Language.)


What if you kill mDNS process do things start working again?

is killing mDNSResponder and mDNSResponderHelper?


What if you run a simple C program that calls getaddrinfo, when the hangs start happening, does it hang? ex: http://www.logix.cz/michal/devel/various/getaddrinfo.c ?

clang getaddrinfo.c -o getaddrinfo.c
./getaddrinfo [domain]
User$ ./getaddrinfo facebook.com β‰ˆ 5.52 Seconds
Host: facebook.com
IPv4 address: 67.15.100.252 (facebook.com)

User$ ./getaddrinfo google.com β‰ˆ 0 Seconds
Host: google.com
IPv4 address: 172.217.24.14 (google.com)
IPv6 address: 2404:6800:4008:801::200e (google.com)

User$ ./getaddrinfo wikipedia.org β‰ˆ 1.26 Seconds
Host: wikipedia.org
IPv4 address: 69.171.232.21 (wikipedia.org)

User$ ./getaddrinfo facebook.com β‰ˆ 2.47 Second.
Host: facebook.com
IPv4 address: 66.220.151.20 (facebook.com)

User$ ./getaddrinfo twitter.com β‰ˆ 2.53 Seconds
Host: twitter.com
IPv4 address: 66.220.155.14 (twitter.com)

User$ ./getaddrinfo youtube.com β‰ˆ 0 Seconds
Host: youtube.com
IPv4 address: 8.7.198.45 (youtube.com)
IPv6 address: 2404:6800:4008:800::200e (youtube.com)

User$ ./getaddrinfo facebook.com β‰ˆ 2.45 Seconds
Host: facebook.com
IPv4 address: 69.63.178.13 (facebook.com)

User$ ./getaddrinfo facebook.com β‰ˆ 5.78 Seconds
Host: facebook.com
IPv4 address: 69.63.178.13 (facebook.com)

User$ ./getaddrinfo twitter.com β‰ˆ 1.44 Seconds
Host: twitter.com
IPv4 address: 75.126.124.162 (twitter.com)

User$ ./getaddrinfo youtube.com β‰ˆ 0 Seconds Host: youtube.com IPv4 address: 203.98.7.65 (youtube.com)

User$ ./getaddrinfo google.com β‰ˆ 0 Seconds Host: google.com IPv4 address: 93.46.8.90 (google.com)

User$ ./getaddrinfo wikipedia.com β‰ˆ 0.5 - 1 Seconds Host: wikipedia.com IPv4 address: 208.80.153.232 (wikipedia.com) IPv6 address: 2620:0:860:ed1a::9 (wikipedia.com)

User$ ./getaddrinfo wikipedia.org β‰ˆ 0 Seconds Host: wikipedia.org IPv4 address: 69.171.225.13 (wikipedia.org) IPv4 address: 69.171.225.13 (wikipedia.org) IPv4 address: 78.16.49.15 (wikipedia.org)

User$ ./getaddrinfo gov.af β‰ˆ 0.5 - 1 Seconds getaddrinfo: Undefined error: 0



---

> Maybe requesting facebook.com somehow bworks your DNS server? What if you specify a different DNS server (for your local connection), same result?

Yes, it seems that only the DNS of the Router (Default) will be stuck.
Default DNS: `192.168.1.1` -> Router: `180.76.76.76`, `114.114.114.114`
`8.8.8.8` `8.8.4.4` will not happen this problem.

---

> ...

...

---
636f7374 commented 4 years ago

@rdp

Thanks, I am now 90% - 100% confirmed that this problem will only be triggered by using the default DNS (Router DNS). πŸ˜‹πŸ˜‹πŸ˜‹

rdp commented 4 years ago

Yes on killing mDNSResponder. What if you run two separate crystal processes, one you use for facebook, one for taobao, and hit them with the same pattern does requesting from one process affect the other process? Here's an attempt at a ruby version: https://gist.github.com/rdp/88a55363e254ac037d344480c16d0187 (seems to use fibers so should be about the same). As a side note it appears that HTTP::Server is already using fibers so adding new fibers shouldn't do too much (though might let you control if they're same_thread or not, I suppose).

You should also get the same result in linux if it's your local router. It will be interesting to see if the ruby one has the same issue or not, as well.

Appears the IO::Timeout is generated when the C call (getaddrinfo) takes so long that, when it comes back, the fibers waiting for a connect to occur all say "oops, allotted timeout has passed" even though the connect might have actually occurred, the time keeping mechanism might not have noticed that it has occurred. I was able to repro a small demo example of it here: https://gist.github.com/rdp/31632b22d99d9e5c2cc815c3d51ca446 I wouldn't expect a C call in one fiber to be able to do that to a connect in another, but it is still a mighty head scratcher as to how to not do that...especially considering that I don't fully understand the details for the timeout mechanism :) I might create a new issue for it...

rdp commented 4 years ago

Initial example works OK here (linux). Sometimes I get this with fiber version:

Unhandled exception in spawn: Closed stream (IO::Error)
  from /usr/share/crystal/src/io.cr:128:5 in 'check_open'
  from /usr/share/crystal/src/io/buffered.cr:130:15 in 'write'
  from /usr/share/crystal/src/http/server/response.cr:80:7 in 'write'
  from /usr/share/crystal/src/io.cr:481:7 in 'write_utf8'
  from /usr/share/crystal/src/string.cr:4285:5 in 'to_s'
  from /usr/share/crystal/src/io.cr:184:5 in '<<'
  from /usr/share/crystal/src/io.cr:198:5 in 'print'
  from him.cr:35:5 in '->'
  from /usr/share/crystal/src/fiber.cr:255:3 in 'run'
  from /usr/share/crystal/src/fiber.cr:48:34 in '->'
  from ???

in the spawn version which I think is because the actual request was already closed, just for reference.

636f7374 commented 4 years ago

@rdp

What if you run two separate crystal processes, one you use for facebook, one for taobao, and hit them with the same pattern does requesting from one process affect the other process?

(Just like the previous Gif video). (I really want to send a video of my running process, my network is too bad) (I recorded the whole process, Maybe my ISP or ProxyServer limits my upload speed). (I will try to upload, it may take some time).

Other questions I am testing, please wait a moment. :)

636f7374 commented 4 years ago

@rdp

Yes on killing mDNSResponder.

00:04 (I turned on two run processing (alias | a.k.a a, b) ). 00:04 (a process Port 8776, b process Port 8777). 00:05 (I first run taobao curl (a, b process)). 00:06 - 00:20 (Then I run facebook curl Manual Loop run: (run -> wait 0.01 seconds -> cancel) (a, b process)) β‰ˆ 4 - 5 times (Keep stuck (slow processing)). 00:21 - 00:22 (I opened the activity monitor). 00:22 - 00:24 (When I first forced the kill mDNSResponder process, it was still alive). 00:26 - 00:28 (It seems to process some requests again) (fast processing). 00:28 - 00:29 (So I forced the kill mDNSResponder process again, it was killed). 00:29 - 00:34 (then stuck again). 00:29 - 00:34 (This is stuck for 2-3 seconds, then the mDNSResponder process is revived. 00:34 - 00:43 (When the mDNSResponder process is revived, a, b process is no longer stuck (fast processing)). 00:43 - End (Then I run facebook curl Manual Loop run: (run -> wait 0.01 seconds -> cancel) (a, b process)) (Keep stuck).

636f7374 commented 4 years ago

@rdp

Here's an attempt at a ruby version: https://gist.github.com/rdp/88a55363e254ac037d344480c16d0187 (seems to use fibers so should be about the same).

You should also get the same result in linux if it's your local router. It will be interesting to see if the ruby ​​one has the same issue or not, as well.

(I used (Default DNS: Router: 192.168.1.1) to test the Webrick Ruby script).

When I run taobao.com through curl this doesn't work. (Client It is stuck directly (No request has been processed), Server does not have any logs).

Then I run facebook through curl again. (Client It is stuck directly (No request has been processed), Server does not have any logs).

Then I closed the curl

After about β‰ˆ 30 seconds?, the server has log information.

Ruby Version

ruby 2.6.5p114 (2019-10-01 revision 67812) [x86_64-darwin16]

Commands

ruby webrick.rb
for ((i=1;i<=20000;i++)); do   curl -v http://127.0.0.1:8000/?foo=bar ; done
curl -v http://127.0.0.1:8000

Webrick Verbose

iMac:Example User$ ruby webrick.rb
[2019-11-11 19:10:05] INFO  WEBrick 1.4.2
[2019-11-11 19:10:05] INFO  ruby 2.6.5 (2019-10-01) [x86_64-darwin16]
[2019-11-11 19:10:05] INFO  WEBrick::HTTPServer#start: pid=32689 port=8000
[2019-11-11 19:11:27] ERROR Errno::ETIMEDOUT: Operation timed out - connect(2) for "facebook.com" port 80
    webrick.rb:12:in `initialize'
    webrick.rb:12:in `new'
    webrick.rb:12:in `block in <main>'
    /usr/local/Cellar/ruby/2.6.5/lib/ruby/2.6.0/webrick/httpservlet/prochandler.rb:39:in `do_GET'
    /usr/local/Cellar/ruby/2.6.5/lib/ruby/2.6.0/webrick/httpservlet/abstract.rb:105:in `service'
    /usr/local/Cellar/ruby/2.6.5/lib/ruby/2.6.0/webrick/httpserver.rb:140:in `service'
    /usr/local/Cellar/ruby/2.6.5/lib/ruby/2.6.0/webrick/httpserver.rb:96:in `run'
    /usr/local/Cellar/ruby/2.6.5/lib/ruby/2.6.0/webrick/server.rb:307:in `block in start_thread'
127.0.0.1 - - [11/Nov/2019:19:10:07 CST] "GET /?foo=bar HTTP/1.1" 500 349
- -> /?foo=bar
[2019-11-11 19:11:31] ERROR Errno::ETIMEDOUT: Operation timed out - connect(2) for "facebook.com" port 80
    webrick.rb:12:in `initialize'
    webrick.rb:12:in `new'
    webrick.rb:12:in `block in <main>'
    /usr/local/Cellar/ruby/2.6.5/lib/ruby/2.6.0/webrick/httpservlet/prochandler.rb:39:in `do_GET'
    /usr/local/Cellar/ruby/2.6.5/lib/ruby/2.6.0/webrick/httpservlet/abstract.rb:105:in `service'
    /usr/local/Cellar/ruby/2.6.5/lib/ruby/2.6.0/webrick/httpserver.rb:140:in `service'
    /usr/local/Cellar/ruby/2.6.5/lib/ruby/2.6.0/webrick/httpserver.rb:96:in `run'
    /usr/local/Cellar/ruby/2.6.5/lib/ruby/2.6.0/webrick/server.rb:307:in `block in start_thread'
127.0.0.1 - - [11/Nov/2019:19:10:10 CST] "GET / HTTP/1.1" 500 349
- -> /
636f7374 commented 4 years ago

@rdp

You should also get the same result in linux if it's your local router. It will be interesting to see if the ruby ​​one has the same issue or not, as well.

I have installed Ubuntu 18.04 in the virtual machine (Parallels Desktop), I need to install llvm and crystal. Sorry, Delayed for a long time, I’m a little lazy and my network downloads large files is a bit slow. πŸ€–

636f7374 commented 4 years ago

@rdp

Appears the IO::Timeout is generated when the C call (getaddrinfo) takes so long that, when it comes back, the fibers waiting for a connect to occur all say "oops, allotted timeout has passed" even though the connect might have actually occurred, the time keeping mechanism might not have noticed that it has occurred. I was able to repro a small demo example of it here: https://gist.github.com/rdp/31632b22d99d9e5c2cc815c3d51ca446 I wouldn't expect a C call in one fiber to be able to do that to a connect in another, but it is still a mighty head scratcher as to how to not do that...especially considering that I don't fully understand the details for the timeout mechanism :) I might create a new issue for it...

_7 _9 _0 _9? (issues).

I am triggered by a tool like Proxifier. I should record the video at the time, maybe now? I will try it later. like Proxifier (TUNTAP Virtual Network Card + MITM (bad?)). I don't want the account to be associated, so I added an underscore.

rdp commented 4 years ago

OK if you have 2 separate processes (each with its own different port) for crystal, then accessing facebook on one should not slow the other one down... :)

636f7374 commented 4 years ago

@rdp

I understand it wrong, I thought it was two processes running at the same time (all stuck (Port 8777, 8776) slow processing). (mDNSResponder has only been tested this way).

[Window 1 | 0:00]: `./a` => 8777
[Window 2 | 0:00]: `./b` => 8776
[Window 3 | 0:01]: `for ((i=1;i<=20000;i++)); do   curl -v http://127.0.0.1:8777/?foo=bar ; done`
[Window 4 | 0:01]: `for ((i=1;i<=20000;i++)); do   curl -v http://127.0.0.1:8776/?foo=bar ; done`
[Window 5 | 0:02]: `curl -v http://127.0.0.1:8777`
[Window 6 | 0:02]: `curl -v http://127.0.0.1:8776`

If it is accessed sequentially, it will not affect another server (another process) (everything is normal (Port 8776)), Port 8777 Server is stuck (slow processing).

[Window 1 | 0:00]: `./a` => 8777
[Window 2 | 0:00]: `./b` => 8776
[Window 3 | 0:01]: `for ((i=1;i<=20000;i++)); do   curl -v http://127.0.0.1:8777/?foo=bar ; done`
[Window 4 | 0:01]: `for ((i=1;i<=20000;i++)); do   curl -v http://127.0.0.1:8776/?foo=bar ; done`
[Window 5 | 0:02]: `curl -v http://127.0.0.1:8777`
636f7374 commented 4 years ago

@rdp

I have already installed it (linux / ubuntu), I am testing it.

parallels@parallels-Parallels-Virtual-Platform:~$ crystal -v
Crystal 0.31.1 [0e2e1d067] (2019-09-30)

LLVM: 8.0.0
Default target: x86_64-unknown-linux-gnu
636f7374 commented 4 years ago

@rdp

Yes, after testing, Ubuntu will have the same problem.

rdp commented 4 years ago

OK at this point I think we may as well close this as "works as expected" (it blocks on the gethostbyname, which causes crystal to freeze till it returns, as it does with all C calls). i can file some issues to point out how painful the situation is :)

636f7374 commented 4 years ago

@rdp

_7 _9 _0 _9 (issues).

(This problem can be triggered every time, without any additional conditions) (No need for TUNTAP, MITM, Proxifier). (But you must use a China proxy server request to trigger this problem). (I think, if there is no timeout, HTTP::Client should appear hangs problem).

Problem

require "http/client"
require "openssl"

class HTTP::Client
  def self.new(uri : URI, tls = nil)
    tls = tls_flag(uri, tls)
    host = validate_host(uri)
    client = new(host, uri.port, tls)
    begin
      yield client
    ensure
      puts [Time.utc, "#L12"]
      client.close
      puts [Time.utc, "#L14"]
    end
  end
end

tls = OpenSSL::SSL::Context::Client.new
tls.verify_mode = OpenSSL::SSL::VerifyMode::NONE

puts [Time.utc, LibSSL::OPENSSL_VERSION, "https://bing.com"]

begin
  HTTP::Client.new uri: URI.parse("https://bing.com"), tls: tls do |client|
    client.connect_timeout = 30.second
    client.read_timeout = 30.second
    client.get("/") do |y|
      puts [Time.utc, y]
      # puts [y.body_io.gets_to_end]
    end
  end

  puts [0, 3, 4, 5]
rescue ex : IO::Timeout
  puts [Time.utc, 2, ex.message]
end

Commands

crystal build bing.cr --release
./bing

Verbose

iMac:Example User$ ./bing
[2019-11-12 07:28:51.565831000 UTC, "1.0.2", "https://bing.com"]
[2019-11-12 07:28:52.300818000 UTC, #<HTTP::Client::Response:0x102297f00 @version="HTTP/1.1", @status=MOVED_PERMANENTLY, @status_message="Moved Permanently", @headers=HTTP::Headers{"Location" => "https://cn.bing.com/", "Server" => "Microsoft-IIS/10.0", "X-MSEdge-Ref" => "Ref A: 24786EA558F449F0BDF8B55A37DC8027 Ref B: HKGEDGE0515 Ref C: 2019-11-12T07:28:51Z", "Date" => "Tue, 12 Nov 2019 07:28:50 GMT", "Content-Length" => "0"}, @body_io=nil, @cookies=nil, @body="">]
[2019-11-12 07:28:52.301368000 UTC, "#L12"]
[2019-11-12 07:29:22.303095000 UTC, 2, "Read timed out"]

Capture

Bing Capture

636f7374 commented 4 years ago

@rdp

Okay. :)

But I still have a confusion, if all C calls will have the same problem, but it seems that Swift, Java (reverse proxy server) does not have this problem.

I don't seem to have this problem when I use some other reverse proxy tools (E.g. BurpSuite (Java)).

Or do they solve this problem? (other language)?

Capture

BurpSuite Capture

636f7374 commented 4 years ago

Finally

In the end, it seems that I can't solve this problem, some disappointed. It seems that (E.g. Java, Rust (Tokio), Swift) use some kind of asynchronous. I hope that one day this problem will be solved (in Crystal). If someone else has encountered this problem, ... @asterite I might try libuv(uv_getaddrinfo), libevent(evdns_getaddrinfo) (although I don't know much about C). Finally, thank you for replying to this issue.

References

rdp commented 4 years ago

You might could work around it by shelling out to "nslookup" to lookup the DNS IP address each time (or looking it up once, and cacheing it). Not sure how that works with the built-in TCP client though. Yeah there's definitely some work to be done there, it is absolutely blocking the world at present, which doesn't seem quite right...

636f7374 commented 4 years ago

@rdp Okay, thank you for your advice and help. :) πŸ™‚

636f7374 commented 4 years ago

C.getaddrinfo

I know what @rdp means, Let me translate what it says. If you call C.getaddrinfo from Crystal, it will cause the handler to freeze. (Certain system calls block forever and don't work well with fibers). #1454 (The handler will freeze and cannot accept new clients).

Custom DNS resolver

If you make a DNS resolver (Synchronous non-async) in Crystal, there will be blocking problems, but this will only block the current fiber and will not cause the handler to freeze. (I have tested it, sending a DNSPacket over UDPSocket to an invalid UDPServer and receiving a response, which only blocking the current fiber).

Window 1 | 00:01: ./example Window 2 | 00:02: loop { sudo nc -l -u 53 -> waiting 0.1 seconds -> cancel } Window 3 | 00:03: for ((i=1;i<=20000;i++)); do curl -v http://127.0.0.1:8777; done (which only blocking the current fiber). Window 4 | 00:04: loop { curl -v "http://127.0.0.1:8777" -> waiting 0.1 seconds -> cancel } (not freeze | not get stuck).

(The handler does not freeze and can accept new clients). (When other users request this HTTP::Server, it will be processed normally and will not get stuck).

In Conclusion

It's not hard to make a DNS resolver (Synchronous non-async). Then maybe some LRU cache is needed. This is much simpler than using libuv, libevent.

Sometimes these problems need to be solved by themselves, like OpenSSL binding (create certificate / create private key / MEMBIO / load certificate from memory / load private key from memory). like #7897, #7896 (stdlib.cr Already available).

They may not help for (me / you), Sometimes (me / you) may need to solve this problem (yourself / myself).

This made me learn a lot :), (Just do it, wait for my good news πŸ‘Œ).

rdp commented 4 years ago

I guess the real fix for this would be some kind of asynchronous [even if threaded] DNS lookup (at least option) for those of us with misbehaving DNS servers... :|

636f7374 commented 4 years ago

@rdp

Update

Yes, but LibC.getaddrinfo calls are very deadly. (Certain system calls block forever and don't work well with fibers).

Synchronization is not as good as asynchronous "better". But it won't get stuck like LibC.getaddrinfo (This is enough for me).

The DNSResolver project progress has now reached 30%. (I am researching the DNS packet structure and it will take some time). (I don't think it will take too long).

class Socket def connect(ip_address : IPAddress, connect_timeout = nil) Addrinfo.build_addrinfo ip_address, @family, @type, @protocol do |addrinfo| connect(addrinfo, timeout: connect_timeout) { |error| error } end end end

class TCPSocket def initialize(ip_address : IPAddress, dns_timeout = nil, connect_timeout = nil) Addrinfo.build_tcp ip_address do |addrinfo| super addrinfo.family, addrinfo.type, addrinfo.protocol connect(addrinfo, timeout: connect_timeout) do |error| close error end end end end

class Socket struct Addrinfo def self.build_addrinfo(ip_address : IPAddress, family, type, protocol, &block : Addrinfo ->) ip_address_size = ip_address.size.to_u8 service = ip_address.port

  hints = LibC::Addrinfo.new
  hints.ai_family = (ip_address.family || Family::UNSPEC).to_i32
  hints.ai_socktype = type
  hints.ai_protocol = protocol
  hints.ai_flags = 0_u8
  hints.ai_addrlen = ip_address_size
  hints.ai_addr = ip_address.to_unsafe
  hints.ai_addr.value.sa_len = ip_address_size

  if service.is_a? Int
    hints.ai_flags |= LibC::AI_NUMERICSERV
  end

  # On OS X < 10.12, the libsystem implementation of getaddrinfo segfaults
  # if AI_NUMERICSERV is set, and servname is NULL or 0.
  {% if flag?(:darwin) %}
    if (service == 0_i32 || service.nil?) && (hints.ai_flags & LibC::AI_NUMERICSERV)
      hints.ai_flags |= LibC::AI_NUMERICSERV
    end
  {% end %}

  yield new pointerof(hints)
end

def self.build_tcp(ip_address : IPAddress, family = Family::UNSPEC)
  build_addrinfo ip_address, family, Type::STREAM, Protocol::TCP do |addrinfo|
    yield addrinfo
  end
end

def self.build_udp(ip_address : IPAddress, family = Family::UNSPEC)
  build_addrinfo ip_address, family, Type::DGRAM, Protocol::UDP do |addrinfo|
    yield addrinfo
  end
end

end end

Use TCPSocket to Connect IPv4 | IPv6 Address, Not using C.getaddrinfo

tcp_socket = TCPSocket.new Socket::IPAddress.new "127.0.0.1", 1234 tcp_socket << "Hello World!" tcp_socket.close

tcp_socket = TCPSocket.new Socket::IPAddress.new "::1", 1234 tcp_socket << "Hello World!" tcp_socket.close

Use UDPSocket to Connect IPv4 | IPv6 Address, Not using C.getaddrinfo

udp_socket = UDPSocket.new Socket::Family::INET udp_socket.connect Socket::IPAddress.new "127.0.0.1", 1234 udp_socket << "Hello World!" udp_socket.close

udp_socket = UDPSocket.new Socket::Family::INET6 udp_socket.connect Socket::IPAddress.new "::1", 1234 udp_socket << "Hello World!" udp_socket.close

rdp commented 4 years ago

getaddrinfo should already "not block" for IP addresses like 127.0.0.1 (does it?). That's actually an excellent idea (just monkey patch it, possibly in a shard?) would be nice to see something in the stdlib for it eventually as well, either includable or as a run-time option or something, see the other issues :) Just for completeness sake here's a proof of concept "monkey patch to use the old libevent dns resolver": https://gist.github.com/rdp/c776c1ea95dfd666d1d828eba4caabc3 FWIW... :)

636f7374 commented 4 years ago

@rdp


getaddrinfo should already "not block" for IP addresses like 127.0.0.1 (does it?).

It should be, but I think, if it is an IP address, it does not need to use LibC.getaddrinfo. (Prevent any problems caused by LibC.getaddrinfo)


That's actually an excellent idea (just monkey patch it, possibly in a shard?) would be nice to see something in the stdlib for it eventually as well, either includable or as a run-time option or something, see the other issues :)

Okay, I need more work to use, I am considering using CrDNS instead of LibC.getaddrinfo. Currently CrDNS has some damage, I need to fix it. In order to work as soon as possible, I may remove the asynchronous feature of CrDNS.

In addition: I looked at the inside of getaddrinfo.c yesterday. It seems that LibC.getaddrinfo internally uses libresolv, libnss_dns. So I feel like giving up using LibC.getaddrinfo.

libresolv is unboundedly synchronous.

You'd expect a real DNS client library to do better in this aspect than getaddrinfo(), but no: libresolv's function calls are still purely synchronous and may uncontrollably block if the network is unresponsive .
getaddrinfo() performs NSS resolution, not DNS resolution.

It is unboundedly synchronous.
DNS resolution performs network I/O, which can take a non-negligible amount of time. getaddrinfo() is a blocking call and there is no way to specify a timeout to make it return early, so it may block indefinitely. This is bad design.

Also, network operations being asynchronous by nature, even a synchronous API should provide a way to perform several queries at once and return when one of them, or all of them, get an answer. getaddrinfo() does not even offer that.

Conclusion
getaddrinfo() is a toy interface. For any half-serious DNS work, another API must be used.
The above system calls correlate with the UDP packet dump. Two packets are sent with one sendmmsg, then the two responses are read with recvfrom.

(Notice in passing the calls to poll, blocking the entire process for up to 5 seconds waiting for the DNS response. In a program with an event loop or β€œgreen threads”, this can be disastrous. getaddrinfo does not play nicely with your event loop!)

But getaddrinfo does a lot before these system calls, and it does quite a bit after them, too. You see the UDP β€œconnection” is to a DNS server at 10.0.2.3. But how did getaddrinfo know about 10.0.2.3? It’s not a function argument. Just before the DNS request, the process makes these system calls:

Just for completeness sake here's a proof of concept "monkey patch to use the old libevent dns resolver": https://gist.github.com/rdp/c776c1ea95dfd666d1d828eba4caabc3 FWIW... :)

Okay, I will try it later. :)


References

636f7374 commented 4 years ago

@rdp


Project Done (DNS::Resolver)

Warning


Example

dns_servers = [] of Tuple(Socket::IPAddress, DNS::Resolver::Protocol) dns_servers << Tuple.new Socket::IPAddress.new("8.8.8.8", 53_i32), DNS::Resolver::Protocol::UDP dns_servers << Tuple.new Socket::IPAddress.new("1.1.1.1", 53_i32), DNS::Resolver::Protocol::UDP

buffer = uninitialized UInt8[4096_i32]

resolver = DNS::Resolver.new dns_servers resolver.ip_cache = DNS::Resolver::Cache::IPAddress.new

begin socket = SuperTCPSocket.new "www.example.com", 80_i32, resolver, 5_i32 socket.read_timeout = 5_i32 socket.write_timeout = 5_i32 rescue abort "Connect Failed" end

begin socket << "GET / HTTP/1.1\r\nHost: www.example.com\r\nConnection: close\r\n\r\n" rescue abort "Write Failed" end

begin length = socket.read buffer.to_slice rescue abort "Read Failed" end

puts [length, String.new buffer.to_slice[0_i32, length]]


* DNS::Resolver + DNS::Resolver::Cache
```crystal
require "stdlib"

dns_servers = [] of Tuple(Socket::IPAddress, DNS::Resolver::Protocol)
dns_servers << Tuple.new Socket::IPAddress.new("8.8.8.8", 53_i32), DNS::Resolver::Protocol::UDP
dns_servers << Tuple.new Socket::IPAddress.new("1.1.1.1", 53_i32), DNS::Resolver::Protocol::UDP

resolver = DNS::Resolver.new dns_servers
resolver.cache = DNS::Resolver::Cache.new

resolver.resolve "google.com", [DNS::Record::ResourceFlag::A, DNS::Record::ResourceFlag::AAAA] do |response|
  puts [:google, Time.utc, response]
end

resolver.resolve "twitter.com", DNS::Record::ResourceFlag::SOA do |response|
  puts [:twitter, Time.utc, response]
end

resolver.resolve "facebook.com", [DNS::Record::ResourceFlag::A, DNS::Record::ResourceFlag::AAAA] do |response|
  puts [:facebook, Time.utc, response]
end

resolver.resolve "twitter.com", DNS::Record::ResourceFlag::SOA do |response|
  puts [:twitter, Time.utc, response]
end

resolver.run
```text
User$ ./dns_resolve
[:google, 2019-12-11 15:20:06.574530000 UTC, [{"google.com", A, #<DNS::Packet::Response:0x1039dbc40 @queries=[#<DNS::Section::Question:0x1039d4a20 @flag=A, @query="google.com", @cls=IN>], @answers=[], @authority=[], @additional=[], @transId=40358, @operationCode=StandardQuery, @responseCode=NoError, @authoritativeAnswer=True, @truncated=False, @recursionDesired=True, @recursionAvailable=True, @authenticatedData=True, @checkingDisabled=True, @questionCount=1, @answerCount=1, @authorityCount=0, @additionalCount=0, @buffer=nil>}, {"google.com", AAAA, #<DNS::Packet::Response:0x1039dba80 @queries=[#<DNS::Section::Question:0x1039d4820 @flag=AAAA, @query="google.com", @cls=IN>], @answers=[#<DNS::Section::Answer:0x1039d47e0 @resourceRecord=#<DNS::Record::AAAA:0x1039ddd80 @cls=IN, @ttl=299, @from="google.com", @ipv6Address="2404:6800:4012::200e">>], @authority=[], @additional=[], @transId=4642, @operationCode=StandardQuery, @responseCode=NoError, @authoritativeAnswer=False, @truncated=False, @recursionDesired=True, @recursionAvailable=True, @authenticatedData=False, @checkingDisabled=False, @questionCount=1, @answerCount=1, @authorityCount=0, @additionalCount=0, @buffer=nil>}]]
[:twitter, 2019-12-11 15:20:06.623635000 UTC, [{"twitter.com", SOA, #<DNS::Packet::Response:0x1039db930 @queries=[#<DNS::Section::Question:0x1039d40c0 @flag=SOA, @query="twitter.com", @cls=IN>], @answers=[#<DNS::Section::Answer:0x1039d4080 @resourceRecord=#<DNS::Record::A:0x1039ddc60 @cls=IN, @ttl=163, @from="twitter.com", @ipv4Address="31.13.84.8">>], @authority=[], @additional=[], @transId=36287, @operationCode=StandardQuery, @responseCode=NoError, @authoritativeAnswer=False, @truncated=False, @recursionDesired=True, @recursionAvailable=True, @authenticatedData=False, @checkingDisabled=False, @questionCount=1, @answerCount=1, @authorityCount=0, @additionalCount=0, @buffer=nil>}]]
[:twitter, 2019-12-11 15:20:06.623981000 UTC, [{"twitter.com", SOA, #<DNS::Packet::Response:0x1039db930 @queries=[#<DNS::Section::Question:0x1039d40c0 @flag=SOA, @query="twitter.com", @cls=IN>], @answers=[#<DNS::Section::Answer:0x1039d4080 @resourceRecord=#<DNS::Record::A:0x1039ddc60 @cls=IN, @ttl=163, @from="twitter.com", @ipv4Address="31.13.84.8">>], @authority=[], @additional=[], @transId=36287, @operationCode=StandardQuery, @responseCode=NoError, @authoritativeAnswer=False, @truncated=False, @recursionDesired=True, @recursionAvailable=True, @authenticatedData=False, @checkingDisabled=False, @questionCount=1, @answerCount=1, @authorityCount=0, @additionalCount=0, @buffer=nil>}]]
[:facebook, 2019-12-11 15:20:06.761727000 UTC, [{"facebook.com", A, #<DNS::Packet::Response:0x1039db690 @queries=[#<DNS::Section::Question:0x1039f7d00 @flag=A, @query="facebook.com", @cls=IN>], @answers=[#<DNS::Section::Answer:0x1039f7cc0 @resourceRecord=#<DNS::Record::A:0x1039ddb40 @cls=IN, @ttl=299, @from="facebook.com", @ipv4Address="157.240.196.35">>], @authority=[], @additional=[], @transId=60636, @operationCode=StandardQuery, @responseCode=NoError, @authoritativeAnswer=False, @truncated=False, @recursionDesired=True, @recursionAvailable=True, @authenticatedData=False, @checkingDisabled=False, @questionCount=1, @answerCount=1, @authorityCount=0, @additionalCount=0, @buffer=nil>}, {"facebook.com", AAAA, #<DNS::Packet::Response:0x1039db4d0 @queries=[#<DNS::Section::Question:0x1039f7a80 @flag=AAAA, @query="facebook.com", @cls=IN>], @answers=[#<DNS::Section::Answer:0x1039f7a40 @resourceRecord=#<DNS::Record::A:0x1039ddab0 @cls=IN, @ttl=158, @from="facebook.com", @ipv4Address="8.7.198.45">>], @authority=[], @additional=[], @transId=31058, @operationCode=StandardQuery, @responseCode=NoError, @authoritativeAnswer=False, @truncated=False, @recursionDesired=True, @recursionAvailable=True, @authenticatedData=False, @checkingDisabled=False, @questionCount=1, @answerCount=1, @authorityCount=0, @additionalCount=0, @buffer=nil>}]]
rdp commented 4 years ago

Nice! Maybe make a shard out of it?

On Wed, Dec 11, 2019 at 8:27 AM _ notifications@github.com wrote:

@rdp https://github.com/rdp

Project Done (DNS::Resolver)

  • I think I'm done -> 636f7374|stdlib.cr * src/stdlib/dns https://github.com/636f7374/stdlib.cr/tree/master/src/stdlib/dns.
  • CrDNS https://github.com/teknomunk/cr-dns is very broken, so I made this DNS::Resolver.
  • Added some Cache features, but Cache does not use LRUCache, uses some custom caches. (Cache (Any) | Cache::IPAddress (A, AAAA))
  • Support querying records through multiple DNS servers.
  • Support A, AAAA, NS, PTR, CNAME, SOA, TXT, MX query.
  • Without using C.getaddrinfo.
  • There are many more, I have not listed them ...

Example

  • SuperSocket + DNS::Resolver + DNS::Resolver::Cache::IPAddress

require "stdlib"

dns_servers = [] of Tuple(Socket::IPAddress, DNS::Resolver::Protocol) dns_servers << Tuple.new Socket::IPAddress.new("8.8.8.8", 53_i32), DNS::Resolver::Protocol::UDP dns_servers << Tuple.new Socket::IPAddress.new("1.1.1.1", 53_i32), DNS::Resolver::Protocol::UDP

buffer = uninitialized UInt8[4096_i32]

resolver = DNS::Resolver.new dns_servers resolver.ip_cache = DNS::Resolver::Cache::IPAddress.new begin socket = SuperTCPSocket.new "www.example.com", 80_i32, resolver, 5_i32 socket.read_timeout = 5_i32 socket.write_timeout = 5_i32rescue abort "Connect Failed"end begin socket << "GET / HTTP/1.1\r\nHost: www.example.com\r\nConnection: close\r\n\r\n"rescue abort "Write Failed"end begin length = socket.read buffer.to_slicerescue abort "Read Failed"end puts [length, String.new buffer.to_slice[0_i32, length]]

  • DNS::Resolver + DNS::Resolver::Cache

require "stdlib"

dns_servers = [] of Tuple(Socket::IPAddress, DNS::Resolver::Protocol) dns_servers << Tuple.new Socket::IPAddress.new("8.8.8.8", 53_i32), DNS::Resolver::Protocol::UDP dns_servers << Tuple.new Socket::IPAddress.new("1.1.1.1", 53_i32), DNS::Resolver::Protocol::UDP

resolver = DNS::Resolver.new dns_servers resolver.cache = DNS::Resolver::Cache.new

resolver.resolve "google.com", [DNS::Record::ResourceFlag::A, DNS::Record::ResourceFlag::AAAA] do |response| puts [:google, Time.utc, response]end

resolver.resolve "twitter.com", DNS::Record::ResourceFlag::SOA do |response| puts [:twitter, Time.utc, response]end

resolver.resolve "facebook.com", [DNS::Record::ResourceFlag::A, DNS::Record::ResourceFlag::AAAA] do |response| puts [:facebook, Time.utc, response]end

resolver.resolve "twitter.com", DNS::Record::ResourceFlag::SOA do |response| puts [:twitter, Time.utc, response]end

resolver.run


User$ ./dns_resolve
[:google, 2019-12-11 15:20:06.574530000 UTC, [{"google.com", A, #<DNS::Packet::Response:0x1039dbc40 @queries=[#<DNS::Section::Question:0x1039d4a20 @flag=A, @query="google.com", @cls=IN>], @answers=[], @authority=[], @additional=[], @transId=40358, @operationCode=StandardQuery, @responseCode=NoError, @authoritativeAnswer=True, @truncated=False, @recursionDesired=True, @recursionAvailable=True, @authenticatedData=True, @checkingDisabled=True, @questionCount=1, @answerCount=1, @authorityCount=0, @additionalCount=0, @buffer=nil>}, {"google.com", AAAA, #<DNS::Packet::Response:0x1039dba80 @queries=[#<DNS::Section::Question:0x1039d4820 @flag=AAAA, @query="google.com", @cls=IN>], @answers=[#<DNS::Section::Answer:0x1039d47e0 @resourceRecord=#<DNS::Record::AAAA:0x1039ddd80 @cls=IN, @ttl=299, @from="google.com", @ipv6Address="2404:6800:4012::200e">>], @authority=[], @additional=[], @transId=4642, @operationCode=StandardQuery, @responseCode=NoError, @authoritativeAnswer=False, @truncated=False, @recursionDesired=True, @recursionAvailable=True, @authenticatedData=False, @checkingDisabled=False, @questionCount=1, @answerCount=1, @authorityCount=0, @additionalCount=0, @buffer=nil>}]]
[:twitter, 2019-12-11 15:20:06.623635000 UTC, [{"twitter.com", SOA, #<DNS::Packet::Response:0x1039db930 @queries=[#<DNS::Section::Question:0x1039d40c0 @flag=SOA, @query="twitter.com", @cls=IN>], @answers=[#<DNS::Section::Answer:0x1039d4080 @resourceRecord=#<DNS::Record::A:0x1039ddc60 @cls=IN, @ttl=163, @from="twitter.com", @ipv4Address="31.13.84.8">>], @authority=[], @additional=[], @transId=36287, @operationCode=StandardQuery, @responseCode=NoError, @authoritativeAnswer=False, @truncated=False, @recursionDesired=True, @recursionAvailable=True, @authenticatedData=False, @checkingDisabled=False, @questionCount=1, @answerCount=1, @authorityCount=0, @additionalCount=0, @buffer=nil>}]]
[:twitter, 2019-12-11 15:20:06.623981000 UTC, [{"twitter.com", SOA, #<DNS::Packet::Response:0x1039db930 @queries=[#<DNS::Section::Question:0x1039d40c0 @flag=SOA, @query="twitter.com", @cls=IN>], @answers=[#<DNS::Section::Answer:0x1039d4080 @resourceRecord=#<DNS::Record::A:0x1039ddc60 @cls=IN, @ttl=163, @from="twitter.com", @ipv4Address="31.13.84.8">>], @authority=[], @additional=[], @transId=36287, @operationCode=StandardQuery, @responseCode=NoError, @authoritativeAnswer=False, @truncated=False, @recursionDesired=True, @recursionAvailable=True, @authenticatedData=False, @checkingDisabled=False, @questionCount=1, @answerCount=1, @authorityCount=0, @additionalCount=0, @buffer=nil>}]]
[:facebook, 2019-12-11 15:20:06.761727000 UTC, [{"facebook.com", A, #<DNS::Packet::Response:0x1039db690 @queries=[#<DNS::Section::Question:0x1039f7d00 @flag=A, @query="facebook.com", @cls=IN>], @answers=[#<DNS::Section::Answer:0x1039f7cc0 @resourceRecord=#<DNS::Record::A:0x1039ddb40 @cls=IN, @ttl=299, @from="facebook.com", @ipv4Address="157.240.196.35">>], @authority=[], @additional=[], @transId=60636, @operationCode=StandardQuery, @responseCode=NoError, @authoritativeAnswer=False, @truncated=False, @recursionDesired=True, @recursionAvailable=True, @authenticatedData=False, @checkingDisabled=False, @questionCount=1, @answerCount=1, @authorityCount=0, @additionalCount=0, @buffer=nil>}, {"facebook.com", AAAA, #<DNS::Packet::Response:0x1039db4d0 @queries=[#<DNS::Section::Question:0x1039f7a80 @flag=AAAA, @query="facebook.com", @cls=IN>], @answers=[#<DNS::Section::Answer:0x1039f7a40 @resourceRecord=#<DNS::Record::A:0x1039ddab0 @cls=IN, @ttl=158, @from="facebook.com", @ipv4Address="8.7.198.45">>], @authority=[], @additional=[], @transId=31058, @operationCode=StandardQuery, @responseCode=NoError, @authoritativeAnswer=False, @truncated=False, @recursionDesired=True, @recursionAvailable=True, @authenticatedData=False, @checkingDisabled=False, @questionCount=1, @answerCount=1, @authorityCount=0, @additionalCount=0, @buffer=nil>}]]

β€”
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<https://github.com/crystal-lang/crystal/issues/8376?email_source=notifications&email_token=AAADBUHP56EQYJCICSLW4P3QYEBHFA5CNFSM4JFBFQU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGTQ34Y#issuecomment-564596211>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAADBUG2UK3GW2QQPNHNELLQYEBHFANCNFSM4JFBFQUQ>
.
636f7374 commented 4 years ago

@rdp

Good suggestion, I have separated the DNS part. Now it is a standalone repository -> 636f7374|durian.cr. OpenSSL (Man-in-the-middle Toolkit) is also separating -> 636f7374|cherry.cr.

rdp commented 4 years ago

Nice!

On Fri, Dec 13, 2019 at 11:48 AM _ notifications@github.com wrote:

@rdp https://github.com/rdp

Good suggestion, I have separated the DNS part. Now it is a standalone repository -> 636f7374|durian.cr https://github.com/636f7374/durian.cr. OpenSSL (Man-in-the-middle Toolkit) is also separating -> 636f7374|cherry.cr https://github.com/636f7374/cherry.cr.

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/crystal-lang/crystal/issues/8376?email_source=notifications&email_token=AAADBUAXMZYHQJCIBF7YN2TQYPKH3A5CNFSM4JFBFQU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEG24TJY#issuecomment-565561767, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAADBUAQ4GGRFLOPRZ2ZI7DQYPKH3ANCNFSM4JFBFQUQ .