elastic / logstash

Logstash - transport and process your logs, events, or other data
https://www.elastic.co/products/logstash
Other
68 stars 3.5k forks source link

Investigate potential stale WorkerLoop.THREAD_CONTEXT #11502

Open colinsurprenant opened 4 years ago

colinsurprenant commented 4 years ago

Per discussion in https://github.com/elastic/logstash/pull/11492#issuecomment-574264320 per @yaauie observation:

Note that WorkerLoop.THREAD_CONTEXT is a ThreadLocal, so each java thread can pull its own ruby thread context out. The trouble arises if and when a java thread ever runs multiple ruby threads in the course of its lifetime (which I do not know), since we only populate it with ThreadLocal#withInitial and then use WorkerLoop.THREAD_CONTEXT in to pass a ThreadContext to various ruby internals that then use it to determine interrupt state.

This is the code in question:

https://github.com/elastic/logstash/blob/5c4d35343acadc3ab735882c4511228189abe3ea/logstash-core/src/main/java/org/logstash/execution/WorkerLoop.java#L19-L20

We should investigate further into this.

kares commented 4 years ago

interesting, this seems quite wrong unless its by "design" (for some reason) to have the initial context. but than all places this is passed down need to be aware that it might not be the correct current context.

decent JRuby exts never store context objects long-term. getCurrentContext should be sufficient here as its semantics are very much like a thread-local value.

colinsurprenant commented 4 years ago

@kares what do you think is the danger of storing the JRuby ThreadContext in the WorkerLoop Java thread as a ThreadLocal? Could this Java thread be reused at some point (think pipeline reloading) and endup with a stale ThreadContext for example?

kares commented 4 years ago

Could this Java thread be reused at some point (think pipeline reloading) and endup with a stale ThreadContext for example?

callMethod(context, ...) will cause the method impl (either ruby or java) to get the passed context and in most cases the context gets carried over to other methods called down the stack (not a 100% rule). this is rarely problematic since the context instance won't change for the same thread (theoretically it could with fibers but for now we can safely ignore that).

passing around a context for a dead thread LS would quickly run into issues (e.g. stuff such as mutex.lock assumes the current thread to not be alive in terms of gc) ... clearly can not happen by the nature of a thread-local.

also looked into the one potential quite problematic piece Thread.new/start ... assuming there was a callMethod(context) up the stack that reached a Thread.new { ... } code that might have created a mess but in this case JRuby does not use any previous context-ual piece on purpose since a fresh new context needs to be allocated for the starting thread.

so all good - gc will be fine since a thread and its (thread-local) context die together.