Closed kwerle closed 3 years ago
gdb backtrace:
#0 0x00007fdeba798916 in __GI_ppoll (fds=fds@entry=0x7ffdd0b4d350,
nfds=nfds@entry=2, timeout=<optimized out>, sigmask=sigmask@entry=0x0)
at ../sysdeps/unix/sysv/linux/ppoll.c:39
#1 0x00007fdebaf09332 in native_ppoll_sleep (th=th@entry=0x55bc09ddbea0,
rel=rel@entry=0x0) at hrtime.h:148
#2 0x00007fdebaf09b9a in native_sleep (th=th@entry=0x55bc09ddbea0, rel=0x0)
at thread_pthread.c:2170
#3 0x00007fdebaf0dc8e in rb_thread_sleep_interruptible () at thread.c:1307
#4 0x00007fdebae77e55 in waitpid_sleep (x=x@entry=140728104965520)
at process.c:1102
#5 0x00007fdebadc5966 in rb_ensure (
b_proc=b_proc@entry=0x7fdebae77e40 <waitpid_sleep>,
data1=data1@entry=140728104965520,
e_proc=e_proc@entry=0x7fdebae77df0 <waitpid_cleanup>,
data2=data2@entry=140728104965520) at eval.c:1076
#6 0x00007fdebae7f9a9 in rb_f_system (argc=<optimized out>,
argv=<optimized out>) at process.c:4476
#7 0x00007fdebaf3a79b in vm_call_cfunc_with_frame (ci=0x55bc12690490,
cc=<optimized out>, calling=<optimized out>, reg_cfp=0x7fdeba6a6da8,
ec=0x55bc09ddc328) at vm_insnhelper.c:1908
#8 vm_call_cfunc (ec=0x55bc09ddc328, reg_cfp=0x7fdeba6a6da8,
calling=<optimized out>, ci=0x55bc12690490, cc=<optimized out>)
at vm_insnhelper.c:1924
--Type <RET> for more, q to quit, c to continue without paging--q
Quit
(gdb) p * fds
$1 = {fd = 3, events = 1, revents = 0}
On the advice of a friend:
(gdb) p (int)close(3)
[ASYNC BUG] consume_communication_pipe: read(3)
EBADF
ruby 2.6.5p114 (2019-10-01 revision 67812) [x86_64-linux]
[NOTE]
You may have encountered a bug in the Ruby interpreter or extension libraries.
Bug reports are welcome.
For details: https://www.ruby-lang.org/bugreport.html
$2 = 0
(gdb) quit
A debugging session is active.
Inferior 1 [process 89] will be detached.
Quit anyway? (y or n) y
Detaching from program: /usr/local/bin/ruby, process 89
[Inferior 1 (process 89) detached]
root@4f1f0c16d672:/tmp/src# fg
bash: fg: job has terminated
[1]+ Aborted rails runner "system 'ls'"
Sure looks like that pipe isn't closing cleanly - but if you force it closed then things exit.
root@4f1f0c16d672:/tmp/src# ls -al /proc/136/fd/3
lrwx------ 1 root root 64 Apr 6 16:29 /proc/136/fd/3 -> 'anon_inode:[eventfd]'
I should also mention that the app hangs before at_exit
fires. But if you hit ctrl-c, the app will then continue and fire at_exit blocks.
It's interesting that this file descriptor seems to exist for any ruby process. In ruby 2.6.5 it looks like:
lrwx------ 1 root root 64 Feb 2 09:48 3 -> 'anon_inode:[eventfd]'
But in ruby 2.5.8 it is clearly a pipe:
lr-x------ 1 root root 64 Feb 2 09:29 3 -> 'pipe:[146388]'
Hi kwerle I have no idea for your case, however it seems that Java consumes SIGCHLD. I wonder if you could test wirh -Xrs option to call Rjb::load. for ex) Rjb::load('.', ['-Xrs'])
arton, Thank you for your time and efforts! We're already passing that to the loader.
It turns out that updating our java library (chemaxon) has invalidated my trivial example - which now exits. We are, however, still seeing problems exiting some system calls. I have not yet been able to isolate them locally. I will follow up on this ticket once I know more.
kwerle Thank you for your collaboration. I'm sorry, I wasn't able to help you for this issue.
@akiotajima Thank you for this project! It is very useful to us and we will continue to use it gladly.
We can only reproduce this issue on Java 8. When we upgrade to java 11 it no longer happens. We will upgrade!
I'm always a little sad to close an issue without getting to the root cause - but in cases like this I will let it go.
I'm afraid I do not have a trivial test case for this issue.
We are using a 3rd party Java library (chemaxon). We go across the bridge and set up a database connection on the java side. Once we do this, it causes system commands to not exit. In brief:
The system command executes the ls. The output is displayed. But the system call never exits.
rbspy recording:
This did NOT happen in ruby 2.5.5. It still happens in 2.7.0rc1
Any help appreciated. If there is anything I can do to help, please let me know.
All testing done in docker containers (ruby:2.5.5, ruby:2.6.5)