Closed jmoriau closed 7 years ago
Seems like a bug in the interaction between the stdlib and openssl.
I just got a stack trace, you'll appreciate how useful it is :smile:
[2016-08-17 17:40:44 +0200]:[LIFX]:[Errno]:[#<Errno:0x9f53c0 @message="SSL_shutdown: Operation now in progress", @cause=nil, @callstack=CallStack(@callstack=[Pointer(Void)@0x414fbe, Pointer(Void)@0x449b17, Pointer(Void)@0x48dea5, Pointer(Void)@0x4f3600, Pointer(Void)@0x505ce5, Pointer(Void)@0x417073, Pointer(Void).null], @backtrace=nil), @errno=115>]
SSL_shutdown: Operation now in progress (Errno)
[4280254] ???
[4496151] ???
[4775589] ???
[5191168] ???
[5266661] ???
[4288627] ???
[0] ???
I'll try to build without --release this time :smile:
The only thing it printed without the --release flag is this:
[4947480] *raise<Errno>:NoReturn +24
@jmoriau can you try running your process with gdb
?
@luislavena I used to use GDB cli back when I was still writing C (like 5 years ago), is it the same for crystal? Do I need to compile with some flags to get the symbols added? Also What would I be looking for since the process keeps running (it simply kills one of the fibers running, and the more I see it happen the more I think the fiber being killed has nothing to do with this, because sometimes the error happens and the fiber is not killed)? Can I easily set a breakpoint at a file/line? Then what?
Sorry for all these questions, never used a debugger with crystal before (didn't even know it could be done). Thanks!
Crystal compiles to native code and can be somewhat debugged with any native debugger tool like gdb or lldb. Except for displaying variable (we don't genrerate the necessary data yet) you can set breakpoints, catch signals, show backtraces, use gdb -tui (with some line errors), ...
File and line and symbols can be added with the --debug
flag.
@ysbaddaden ok thanks, then I should know how to use it. But I still wouldn't know what I'm looking for once I break in the rescue block.
StackOverflow's shell
reported having this issue.
I was able to easily reproduce the bug with this program (based on his):
require "http/client"
require "json"
status = -1
10.times do
spawn do
while true
puts "Pinging..."
response = HTTP::Client.get "https://randomapi.com/api/6de6abfedb24f889e0b5f675edc50deb?fmt=raw&sole"
puts "Pong"
response_to_hash = JSON.parse response.body
status = response_to_hash[0]["first"].to_s.size
pp status
end
end
end
Fiber.yield
puts "Post yield"
sleep 20000.seconds # so the main thread doesn't exit yet
Running Crystal 0.18.7 (2016-08-05)
from brew on OSX 10.11.6, I got this exception:
Unhandled exception:
SSL_shutdown: Operation now in progress (Errno)
[4440073698] *CallStack::unwind:Array(Pointer(Void)) +82
[4440073601] *CallStack#initialize:Array(Pointer(Void)) +17
[4440073560] *CallStack::new:CallStack +40
[4440034617] *raise<Errno>:NoReturn +25
[4440200954] *OpenSSL::SSL::Socket+@OpenSSL::SSL::Socket#close:Nil +346
[4440259434] *HTTP::Client#close:Nil +122
[4440256106] *HTTP::Client::exec<String, String, Nil, Nil, Nil>:HTTP::Client::Response +650
[4440255439] *HTTP::Client::get<String>:HTTP::Client::Response +31
[4440052194] ~procProc(Nil)@./tmp/test.cr:7 +66
[4440070932] *Fiber#run:(Int64 | Nil) +164
[4440042917] ~proc2Proc(Fiber, (Int64 | Nil))@/usr/local/Cellar/crystal-lang/0.18.7_1/src/fiber.cr:28 +21
You can see the whole output (until I've interrupted the execution) at this gist.
I can confirm that this also happens in master
Crystal 0.19.4+142 [6b09c3b] (2016-11-19)
I had the very same issue here:
WARN: Errno: SSL_shutdown: Operation now in progress
WARN: [4401960353] *raise<Exception+>:NoReturn +81
[4402184488] *Sidekiq::Processor#process<Sidekiq::BasicFetch::UnitOfWork>:Bool +1432
[4402153711] *Sidekiq::Processor#run:(Sidekiq::Processor | Nil) +95
[4402152488] ~procProc(Nil)@./libs/sidekiq/sidekiq/server/processor.cr:39 +296
[4402134017] ~procProc(Nil)@./libs/sidekiq/sidekiq/server/util.cr:20 +33
[4401345567] *Fiber#run:(IO::FileDescriptor | Nil) +47
Can confirm, I was trying Gangwolf/hncr and got this error too.
Crystal 0.20.1 (2016-12-06)
Linux denvit-ws1 4.8.13-1-ARCH #1 SMP PREEMPT Fri Dec 9 07:24:34 CET 2016 x86_64 GNU/Linux
I've hit the same bug today. Crystal 0.20.5 [ccf46c0] (2017-01-20)
The bug has been acknownledged, further "me too" messages won't help resolve the issue.
We need a lead to understand why and when OpenSSL fails with "operation in progress", if the behavior is expected, and what happens in the stdlib that generates this OpenSSL error. Further investigation, a reduced failing example or spec that always reproduces the bug (quickly) are more than welcomed.
Minimal code to reproduce this issue:
require "http/client"
require "json"
2.times do
spawn do
i = 1;
while i!=2
response = HTTP::Client.get "https://google.com"
i += 1;
end
end
end
sleep 10.seconds # so the main thread doesn't exit yet
Tested with Crystal 0.20.5 (2017-01-25)
, Linux denvit-ws1 4.9.11-1-ARCH
and openssl 1.0.2.k-1
dvitali@denvit-ws1:crystal/issue-3168 $ crystal run main.cr
Unhandled exception in spawn:
SSL_shutdown: Operation now in progress (Errno)
0x4afcf9: *OpenSSL::SSL::Socket+ at /usr/lib/crystal/openssl/ssl/socket.cr 148:13
0x4ba555: *HTTP::Client#close:Nil at /usr/lib/crystal/http/client.cr 635:19
0x4b7975: *HTTP::Client::exec<String, String, Nil, Nil, Nil>:HTTP::Client::Response at /usr/lib/crystal/http/client.cr 615:5
0x4b767d: *HTTP::Client::get<String>:HTTP::Client::Response at /home/dvitali/.cache/crystal/macro64371184.cr 37:7
0x45cd6a: *Fiber#run:(IO::FileDescriptor | Nil) at /usr/lib/crystal/fiber.cr 114:3
0x44d096: ~proc2Proc(Fiber, (IO::FileDescriptor | Nil)) at /usr/lib/crystal/concurrent.cr 60:3
0x0: ??? at ??
Lucky me) Sorry, when I learn to write tests, I'll post the results.
Bump.
@ysbaddaden looks like we should retry on Errno::EINPROGRESS
errors on unbuffered close.
Just like EAGAIN. Pull request #4433 sent.
@denysvitali Just FYI, your example contains while i!=2
which is parsed as while i! = 2
which is the same as while 2
and while true
. Please use crystal tool format
to reduce human error and so that your code examples can be further reduced.
require "http/client"
2.times do
spawn do
loop do
HTTP::Client.get "https://google.com"
end
end
end
sleep 2.seconds
Thanks for the tip!
require "http/client"
require "json"
2.times do
spawn do
i = 1
while i != 5
response = HTTP::Client.get "https://google.com"
i += 1
end
end
end
sleep 10.seconds # so the main thread doesn't exit yet
dvitali@denvit-ws1:crystal/test $ crystal run http_problem.cr -1-
Unhandled exception in spawn:
SSL_shutdown: Operation now in progress (Errno)
0x46e2cb: *CallStack::unwind:Array(Pointer(Void)) at ??
0x4eda2b: unbuffered_close at /usr/lib/crystal/openssl/ssl/socket.cr 150:13
0x4ed87f: close at /usr/lib/crystal/io/buffered.cr 197:5
0x4f4e75: close at /usr/lib/crystal/http/client.cr 634:19
0x4f1952: exec at /usr/lib/crystal/http/client.cr 614:5
0x4f164d: get at /usr/lib/crystal/http/client.cr 329:3
0x46a472: ~procProc(Nil) at /usr/lib/crystal/base64.cr 0:1
0x47fa7a: run at /usr/lib/crystal/fiber.cr 255:3
0x467d96: ~proc2Proc(Fiber, (IO::FileDescriptor | Nil)) at /usr/lib/crystal/concurrent.cr 0:3
0x0: ??? at ??
^C% dvitali@denvit-ws1:crystal/test $ uname -a -1-
Linux denvit-ws1 4.10.13-1-ARCH #1 SMP PREEMPT Thu Apr 27 12:15:09 CEST 2017 x86_64 GNU/Linux
dvitali@denvit-ws1:crystal/test $ pacman -Q openssl
openssl 1.1.0.e-1
dvitali@denvit-ws1:crystal/test $ crystal -v
Crystal 0.22.0 (2017-04-22) LLVM 4.0.0
@denysvitali FYI #4433 fixes the bug.
You're right, just tested #4433 , it works!
dvitali@denvit-ws1:crystal/bin ‹retry-EINPROGRESS-on-unbuffered-close*›$ ./crystal ~/Documents/crystal/test/http_problem.cr
Using compiled compiler at `.build/crystal'
dvitali@denvit-ws1:crystal/bin ‹retry-EINPROGRESS-on-unbuffered-close*›$ ./crystal -v
Using compiled compiler at `.build/crystal'
Crystal 0.22.0+? [30c7410] (2017-05-20) LLVM 4.0.0
@denysvitali For future reference, compiling the compiler isn't neccesary because it's a stdlib fix. Using bin/crystal
is enough.
Hi,
I have a long running process which controls the lights in my apartment, a small endpoint to receive IFTTT outgoing calls, and a loop inside a fiber (so it can run concurrently with the endpoint) which polls the lights status and sometimes update some of them with a POST call to LIFX's servers.
Every once in a while (like every few days) I noticed the fiber which is supposed to run the polling loop stops and the rescued exceptions prints this for
e.inspect
:#<Errno:0xa59c40 @message="SSL_shutdown: Operation now in progress", @cause=nil, @callstack=CallStack(@callstack=[Pointer(Void)@0x414fbe, Pointer(Void)@0x449ac7, Pointer(Void)@0x48ddc5, Pointer(Void)@0x4f33b0, Pointer(Void)@0x505895, Pointer(Void)@0x417072, Pointer(Void).null], @backtrace=nil), @errno=115>"
Note the loop is supposed to always run since everything inside the loop is wrapped in the begin rescue block. But it seems this errors breaks out of the loop and then the fiber is done.
I'm sorry I don't have a proper backtrace yet (it usually takes a few days to reproduce this) I'll post one next time I get one.