clj-commons / aleph

Asynchronous streaming communication for Clojure - web server, web client, and raw TCP/UDP
http://aleph.io
MIT License
2.54k stars 241 forks source link

Ensure DynamicClassLoader is the Context Class Loader #604

Closed arnaudgeiser closed 2 years ago

arnaudgeiser commented 2 years ago

Description

This PR is a proposal to solve the https://github.com/clj-commons/aleph/issues/603, https://github.com/clj-commons/aleph/issues/365 issues.

The following commit adds a unit test regarding the problem we are currently facing when some code is running outside of a DynamicClassLoader : https://github.com/clj-commons/aleph/commit/404ebcca64ef1e2e414c3f4e08fb2fdef3a33b3f Then, the following one is basically the fix that landed on Riemann (https://github.com/riemann/riemann/pull/951) : https://github.com/clj-commons/aleph/commit/3f9fdfb92b9d45ca2b8c6c13c7795e708fbe9895.

All the credits go to @KingMob obviously who did the all work.

arnaudgeiser commented 2 years ago

I applied the requested changes. The fact I'm not using a macro anymore is making the tests really simpler to write.

However what you propose here is not solving our issue:

(let [thread (Thread/currentThread)]
  (when-not (instance? DynamicClassLoader (.getContextClassLoader thread))
    (.setContextClassLoader thread (RT/makeClassLoader))))

I suppose this is because we are ending here : https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/RT.java#L2179-L2180

KingMob commented 2 years ago

From Slack:

OK, I think I finally understand what was going wrong in the original code. First, it has nothing to do with the netty side at all. The stacktrace for the operation-complete callback, the thread context classloaders, etc, is identical.

It hinges entirely on the fact that wrap-future is called from different threads, one signal, one normal. This means the class-loader variable being set for the callback gets different thread context classloaders. My error was in not understanding the hierarchical nature of classloaders correctly.

I got confused because the signal thread sets a Clojure DynamicClassLoader, and the normal thread only sets the AppLoader/PlatformLoader chain. This seems backwards, right? How can Clojure code load without the CLojure DynamicClassLoader?

The thing is, the class not being found is clojure.lang.PersistentArrayMap, which is a Java class, not Clojure code. The reason it fails in the signal case is because the DynamicClassLoader’s parent is null, which is the bootstrap loader. Without the AppLoader in its parent chain, it can’t find normal java classes.

(Incidentally, the normal version has the AppLoader, but not the DynamicClassLoader in its context loader, but that’s probably not a problem because Clojure adds it to the stack frames for its own use on demand).

It’s not too different really, from what Riemann’s maintainers found, just confusing because it seemed like it should have been working.

KingMob commented 2 years ago

@arnaudgeiser Merge whenever you're ready.

Fixes #603