TwP / logging

A flexible logging library for use in Ruby programs based on the design of Java's log4j library.
https://rubygems.org/gems/logging
MIT License
530 stars 101 forks source link

Multithreading issues in JRuby? #174

Closed mlartz closed 6 years ago

mlartz commented 6 years ago

I've been having some issues with porting my STOMP consumer to JRuby and am currently getting a deadlock (ref https://github.com/stompgem/stomp/issues/142) when I attemp to require 'logging'.

The STOMP gem uses a thread to receive STOMP messages via the network over TLS, and I use a thread pool to process/ack that data to the STOMP client. I've pared the offending code down to a very small consumer, which doesn't replicate the issue (although it might have other issues).

That said, when I require 'logging' in the stripped down consumer (I use logging in my full application), the demo script deadlocks after it receives/acks a few messages. Unfortunately, I've tried to replicate the issue without the STOMP gem (with just threads/thread pool) but have been unable to. So this issue might pertain to any one of JRuby, JRuby-SSL, STOMP (gem), or logging (gem). Mostly, I'm just wondering if 'logging' does anything tricksy on load that might have an effect on threading/locking/IO/SSL under JRuby?

I'm using jruby 9.0.3.0 (2.2.2) 2015-10-21 633c9aa Java HotSpot(TM) 64-Bit Server VM 25.45-b02 on 1.8.0_45-b14 +jit [linux-amd64]

TwP commented 6 years ago

Looking at your example in https://github.com/stompgem/stomp/issues/142#issuecomment-314870902, if you remove the require 'logging' line then you no longer see deadlocks? Is that correct?

mlartz commented 6 years ago

Exactly. Removing the require 'logging' line causes the script to not deadlock, however there are some other issues, as noted by the author of the STOMP gem. This obviously calls logging into question, but the jstack trace seems to indicate that its some interaction with Jruby, SSLSocket, and IO when both reading and writing to the TLS "socket". Although it gets a little confusing, because somehow the threads/processes get labeled with "diagnotic_context.rb".

TwP commented 6 years ago

The only interesting thing the logging framework is doing with threads is in relation to the diagnostic contexts. The goal of that chunk of code is to capture the diagnostic context from the parent thread when a new thread is spawned. The only way to figure out the parent thread is to capture the information when a new thread is created, so that is why new, start, and fork are being monkey patched.

This behavior can be overridden via an environment variable. Would you be willing to run an experiment with your deadlock script and run it with the require 'logging' line present and with the LOGGING_INHERIT_CONTEXT=false environment variable set? I'd like to see if that might be contributing to the deadlock issues you are seeing.

mlartz commented 6 years ago

(Un?)Fortunately, I found no difference when running with or without the environment variable.

LOGGING_INHERIT_CONTEXT=false jruby -J-Xmx2048m -S bundle exec ruby bin/deadlock_test

Other than threading, does logging do any monkeypatching with IO objects?

TwP commented 6 years ago

(Un?)Fortunately

Ha Ha Ha Ha! :joy:

That made me laugh - thank you :bowing_man:

The sad news here is that the logging framework is not doing any crazy initialization or anything related to sockets or IO. You have to instantiate appenders and loggers before any actions are taken.

Here is the list of files required by the logging framework:

One debugging option here is to replace the require 'logging' with a bunch of require lines for each of the libraries listed above. Remove the require statements one by one to see if one of them might contributing to the issue. Other than that I'm pretty stumped.

Is there any output about which thread is deadlocking and where?

gmallard commented 6 years ago

Notes from the stomp gem lead:

I cannot recreate a deadlock (or block/hang) using this test code:

https://gist.github.com/gmallard/f7c709bc26d5a16b6db6ebeeea20f18a

Here is a short test log:

https://gist.github.com/gmallard/1641573d0e040ba2dda4328e08e3071a

Investigation of the issue reported to me uncovered another bug - but this one will not recreate.

For reference:

https://github.com/stompgem/stomp/issues/142

mlartz commented 6 years ago

Thanks for the debug tips. I was able to replicate a deadlock without any of gem imports, and was later able to replicate the deadlock without logging required, however it took much longer for the deadlock to occur (20 seconds vs 20 minutes). So at this point it seems like logging is not involved, sorry for the non-bug report. Everything is OK!

TwP commented 6 years ago

@mlartz thank you so much for the follow up on this issue. Good luck with the deadlock debugging! :+1: