Closed arnaudgeiser closed 1 year ago
Just breezing through on a spare moment. I haven’t had time to take a closer look, but I did notice one mistake: the Clojure classloader isn’t responsible for thread frame propagation. They’re separate issues, and using the classloader alone isn’t sufficient. Even with the classloader in place, binding frame propagation requires the use of Clojure fns to start threads; with Java code, frames won’t be propagated.
Yes, I totally agree with your observation and your conclusion. The question is more: Do we really need thread bindings propagation? That's unlikely people using an asynchronous mechanism expect to have the bindings of the calling threads propagated down, right? And we are talking only about dynamic bindings, here, right?
I mean, a lot of people are not relying on manifold and use CachedThreadPool or FixedThreadPool directly without trouble. I have the impression that "we" (not us actually) have done a lot of tinkering around this issue in the past... I would like to reconsider the status-quo.
OK, I took a closer look to refresh my memory, and I'd forgotten why we started frame propagation in the first place.
While it's true that the classloader is not responsible for frame propagation, I had the direction reversed. It's the classloader we care about, not frame propagation per se. We were only propagating frames to try and make the LOADER
var used by the compiler available to the signal handler thread. (I think.)
We need the Clojure class loader set for all netty threads (not just the signal handler thread) but we switched from setting the LOADER var, to using .setContextClassLoader
on the thread. I think that should be sufficient. The comments say we use bound-fn*
to "Ensure the same bindings are installed on the Netty thread (vars, classloader)", except... the operation-complete
fn we're binding doesn't use any dynamic vars itself. Maybe there was a concern that the call to .getNow()
would trigger some code, but I just looked at all definitions in Netty, and that's not an issue.
All of which is to say, yeah, we can probably stop using bound-fn
on operation-complete
All of which is to say, yeah, we can probably stop using bound-fn on operation-complete
Perfect, polishing the PR then.
For the record @KingMob, here is where everything started : https://github.com/clj-commons/aleph/pull/425, where we put two mechanisms in place.
Description
This removes the usage of
bound-fn*
which has been introduced by the following PR. [1] It was added for safety while there might be no open issue regarding it. For some workloads, this operation can take up to 40% of CPU.Here is the reason why we used to push the bindings [2], but I suppose we fixed it by using
ensure-dynamic-classloader
instead.Here is the gist of the discussion we had with @KingMob :
[1] : https://github.com/clj-commons/aleph/pull/604 [2] : https://github.com/clj-commons/aleph/pull/425