crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.33k stars 1.61k forks source link

Possible bug Errno: @message="SSL_shutdown: Operation now in progress" #3168

Closed jmoriau closed 7 years ago

jmoriau commented 8 years ago

Hi,

I have a long running process which controls the lights in my apartment, a small endpoint to receive IFTTT outgoing calls, and a loop inside a fiber (so it can run concurrently with the endpoint) which polls the lights status and sometimes update some of them with a POST call to LIFX's servers.

Every once in a while (like every few days) I noticed the fiber which is supposed to run the polling loop stops and the rescued exceptions prints this for e.inspect:

#<Errno:0xa59c40 @message="SSL_shutdown: Operation now in progress", @cause=nil, @callstack=CallStack(@callstack=[Pointer(Void)@0x414fbe, Pointer(Void)@0x449ac7, Pointer(Void)@0x48ddc5, Pointer(Void)@0x4f33b0, Pointer(Void)@0x505895, Pointer(Void)@0x417072, Pointer(Void).null], @backtrace=nil), @errno=115>"

Note the loop is supposed to always run since everything inside the loop is wrapped in the begin rescue block. But it seems this errors breaks out of the loop and then the fiber is done.

I'm sorry I don't have a proper backtrace yet (it usually takes a few days to reproduce this) I'll post one next time I get one.

RX14 commented 8 years ago

See: http://stackoverflow.com/questions/10204134/tcp-connect-error-115-operation-in-progress-what-is-the-cause

Seems like a bug in the interaction between the stdlib and openssl.

jmoriau commented 8 years ago

I just got a stack trace, you'll appreciate how useful it is :smile:

[2016-08-17 17:40:44 +0200]:[LIFX]:[Errno]:[#<Errno:0x9f53c0 @message="SSL_shutdown: Operation now in progress", @cause=nil, @callstack=CallStack(@callstack=[Pointer(Void)@0x414fbe, Pointer(Void)@0x449b17, Pointer(Void)@0x48dea5, Pointer(Void)@0x4f3600, Pointer(Void)@0x505ce5, Pointer(Void)@0x417073, Pointer(Void).null], @backtrace=nil), @errno=115>]
SSL_shutdown: Operation now in progress (Errno)

[4280254] ???

[4496151] ???

[4775589] ???

[5191168] ???

[5266661] ???

[4288627] ???

[0] ???

I'll try to build without --release this time :smile:

jmoriau commented 8 years ago

The only thing it printed without the --release flag is this:


[4947480] *raise<Errno>:NoReturn +24
luislavena commented 8 years ago

@jmoriau can you try running your process with gdb?

jmoriau commented 8 years ago

@luislavena I used to use GDB cli back when I was still writing C (like 5 years ago), is it the same for crystal? Do I need to compile with some flags to get the symbols added? Also What would I be looking for since the process keeps running (it simply kills one of the fibers running, and the more I see it happen the more I think the fiber being killed has nothing to do with this, because sometimes the error happens and the fiber is not killed)? Can I easily set a breakpoint at a file/line? Then what?

Sorry for all these questions, never used a debugger with crystal before (didn't even know it could be done). Thanks!

ysbaddaden commented 8 years ago

Crystal compiles to native code and can be somewhat debugged with any native debugger tool like gdb or lldb. Except for displaying variable (we don't genrerate the necessary data yet) you can set breakpoints, catch signals, show backtraces, use gdb -tui (with some line errors), ...

File and line and symbols can be added with the --debug flag.

jmoriau commented 8 years ago

@ysbaddaden ok thanks, then I should know how to use it. But I still wouldn't know what I'm looking for once I break in the rescue block.

matiasgarciaisaia commented 7 years ago

StackOverflow's shell reported having this issue.

I was able to easily reproduce the bug with this program (based on his):

require "http/client"
require "json"

status = -1

10.times do
  spawn do
    while true
      puts "Pinging..."
      response = HTTP::Client.get "https://randomapi.com/api/6de6abfedb24f889e0b5f675edc50deb?fmt=raw&sole"
      puts "Pong"
      response_to_hash = JSON.parse response.body
      status = response_to_hash[0]["first"].to_s.size
      pp status
    end
  end
end

Fiber.yield

puts "Post yield"

sleep 20000.seconds # so the main thread doesn't exit yet

Running Crystal 0.18.7 (2016-08-05) from brew on OSX 10.11.6, I got this exception:

Unhandled exception:
SSL_shutdown: Operation now in progress (Errno)
[4440073698] *CallStack::unwind:Array(Pointer(Void)) +82
[4440073601] *CallStack#initialize:Array(Pointer(Void)) +17
[4440073560] *CallStack::new:CallStack +40
[4440034617] *raise<Errno>:NoReturn +25
[4440200954] *OpenSSL::SSL::Socket+@OpenSSL::SSL::Socket#close:Nil +346
[4440259434] *HTTP::Client#close:Nil +122
[4440256106] *HTTP::Client::exec<String, String, Nil, Nil, Nil>:HTTP::Client::Response +650
[4440255439] *HTTP::Client::get<String>:HTTP::Client::Response +31
[4440052194] ~procProc(Nil)@./tmp/test.cr:7 +66
[4440070932] *Fiber#run:(Int64 | Nil) +164
[4440042917] ~proc2Proc(Fiber, (Int64 | Nil))@/usr/local/Cellar/crystal-lang/0.18.7_1/src/fiber.cr:28 +21

You can see the whole output (until I've interrupted the execution) at this gist.

sdogruyol commented 7 years ago

I can confirm that this also happens in master Crystal 0.19.4+142 [6b09c3b] (2016-11-19)

andersondias commented 7 years ago

I had the very same issue here:

WARN: Errno: SSL_shutdown: Operation now in progress
WARN: [4401960353] *raise<Exception+>:NoReturn +81
[4402184488] *Sidekiq::Processor#process<Sidekiq::BasicFetch::UnitOfWork>:Bool +1432
[4402153711] *Sidekiq::Processor#run:(Sidekiq::Processor | Nil) +95
[4402152488] ~procProc(Nil)@./libs/sidekiq/sidekiq/server/processor.cr:39 +296
[4402134017] ~procProc(Nil)@./libs/sidekiq/sidekiq/server/util.cr:20 +33
[4401345567] *Fiber#run:(IO::FileDescriptor | Nil) +47
denysvitali commented 7 years ago

Can confirm, I was trying Gangwolf/hncr and got this error too. Crystal 0.20.1 (2016-12-06) Linux denvit-ws1 4.8.13-1-ARCH #1 SMP PREEMPT Fri Dec 9 07:24:34 CET 2016 x86_64 GNU/Linux

nobilik commented 7 years ago

I've hit the same bug today. Crystal 0.20.5 [ccf46c0] (2017-01-20)

ysbaddaden commented 7 years ago

The bug has been acknownledged, further "me too" messages won't help resolve the issue.

We need a lead to understand why and when OpenSSL fails with "operation in progress", if the behavior is expected, and what happens in the stdlib that generates this OpenSSL error. Further investigation, a reduced failing example or spec that always reproduces the bug (quickly) are more than welcomed.

denysvitali commented 7 years ago

Minimal code to reproduce this issue:

require "http/client"
require "json"

2.times do
  spawn do
    i = 1;
    while i!=2
      response = HTTP::Client.get "https://google.com"
      i += 1;
    end
  end
end

sleep 10.seconds # so the main thread doesn't exit yet

Tested with Crystal 0.20.5 (2017-01-25), Linux denvit-ws1 4.9.11-1-ARCH and openssl 1.0.2.k-1

Output

dvitali@denvit-ws1:crystal/issue-3168 $ crystal run main.cr
Unhandled exception in spawn:
SSL_shutdown: Operation now in progress (Errno)
0x4afcf9: *OpenSSL::SSL::Socket+ at /usr/lib/crystal/openssl/ssl/socket.cr 148:13
0x4ba555: *HTTP::Client#close:Nil at /usr/lib/crystal/http/client.cr 635:19
0x4b7975: *HTTP::Client::exec<String, String, Nil, Nil, Nil>:HTTP::Client::Response at /usr/lib/crystal/http/client.cr 615:5
0x4b767d: *HTTP::Client::get<String>:HTTP::Client::Response at /home/dvitali/.cache/crystal/macro64371184.cr 37:7
0x45cd6a: *Fiber#run:(IO::FileDescriptor | Nil) at /usr/lib/crystal/fiber.cr 114:3
0x44d096: ~proc2Proc(Fiber, (IO::FileDescriptor | Nil)) at /usr/lib/crystal/concurrent.cr 60:3
0x0: ??? at ??
nobilik commented 7 years ago

Lucky me) Sorry, when I learn to write tests, I'll post the results.

zatherz commented 7 years ago

Bump.

akzhan commented 7 years ago

@ysbaddaden looks like we should retry on Errno::EINPROGRESS errors on unbuffered close.

Just like EAGAIN. Pull request #4433 sent.

RX14 commented 7 years ago

@denysvitali Just FYI, your example contains while i!=2 which is parsed as while i! = 2 which is the same as while 2 and while true. Please use crystal tool format to reduce human error and so that your code examples can be further reduced.

require "http/client"

2.times do
  spawn do
    loop do
      HTTP::Client.get "https://google.com"
    end
  end
end

sleep 2.seconds
denysvitali commented 7 years ago

Thanks for the tip!

require "http/client"
require "json"

2.times do
  spawn do
    i = 1
    while i != 5
      response = HTTP::Client.get "https://google.com"
      i += 1
    end
  end
end

sleep 10.seconds # so the main thread doesn't exit yet
dvitali@denvit-ws1:crystal/test $ crystal run http_problem.cr                                                                                                                            -1-
Unhandled exception in spawn:
SSL_shutdown: Operation now in progress (Errno)
0x46e2cb: *CallStack::unwind:Array(Pointer(Void)) at ??
0x4eda2b: unbuffered_close at /usr/lib/crystal/openssl/ssl/socket.cr 150:13
0x4ed87f: close at /usr/lib/crystal/io/buffered.cr 197:5
0x4f4e75: close at /usr/lib/crystal/http/client.cr 634:19
0x4f1952: exec at /usr/lib/crystal/http/client.cr 614:5
0x4f164d: get at /usr/lib/crystal/http/client.cr 329:3
0x46a472: ~procProc(Nil) at /usr/lib/crystal/base64.cr 0:1
0x47fa7a: run at /usr/lib/crystal/fiber.cr 255:3
0x467d96: ~proc2Proc(Fiber, (IO::FileDescriptor | Nil)) at /usr/lib/crystal/concurrent.cr 0:3
0x0: ??? at ??
^C%                                                                                                                                                                                          dvitali@denvit-ws1:crystal/test $ uname -a                                                                                                                                               -1-
Linux denvit-ws1 4.10.13-1-ARCH #1 SMP PREEMPT Thu Apr 27 12:15:09 CEST 2017 x86_64 GNU/Linux

dvitali@denvit-ws1:crystal/test $ pacman -Q openssl 
openssl 1.1.0.e-1

dvitali@denvit-ws1:crystal/test $ crystal -v
Crystal 0.22.0 (2017-04-22) LLVM 4.0.0
RX14 commented 7 years ago

@denysvitali FYI #4433 fixes the bug.

denysvitali commented 7 years ago

You're right, just tested #4433 , it works!

dvitali@denvit-ws1:crystal/bin ‹retry-EINPROGRESS-on-unbuffered-close*›$ ./crystal ~/Documents/crystal/test/http_problem.cr 
Using compiled compiler at `.build/crystal'
dvitali@denvit-ws1:crystal/bin ‹retry-EINPROGRESS-on-unbuffered-close*›$ ./crystal -v                                      
Using compiled compiler at `.build/crystal'
Crystal 0.22.0+? [30c7410] (2017-05-20) LLVM 4.0.0
RX14 commented 7 years ago

@denysvitali For future reference, compiling the compiler isn't neccesary because it's a stdlib fix. Using bin/crystal is enough.