clj-commons / aleph

Asynchronous streaming communication for Clojure - web server, web client, and raw TCP/UDP
http://aleph.io
MIT License
2.54k stars 241 forks source link

Add classifier to netty-transport-native-epoll dependency #613

Closed DerGuteMoritz closed 2 years ago

DerGuteMoritz commented 2 years ago

Fixes #475.

This is a cherry-pick of 5f5e73b256027d3d887582118af86b580eb08b0c from the 1.0.0 branch minus the kqueue-related changes.

arnaudgeiser commented 2 years ago

We ran exactly into this issue few months/years ago. I just double-checked and it seems we are not using :classifier "linux-x86_64" while Epoll/isAvailable returns true.

Do you have some hints about when using the classifier is necessary?

Otherwise, this change is fine for me. I included many times the :classifier "linux-x86_64" on personal projects exactly because of this.

A small aside: I'm not sure it should be Aleph's responsibility to include this dependency... but what's done is done.

KingMob commented 2 years ago

I'll defer to you two on this. I have no experience messing with epoll.

However, I wasn't a big fan of the code to append the netty version number, and I like this addition to handle classifiers even less. It's fancy for so little benefit.

Can you switch to boring lein deps? (The netty version var can be kept)

DerGuteMoritz commented 2 years ago

We ran exactly into this issue few months/years ago. I just double-checked and it seems we are not using :classifier "linux-x86_64" while Epoll/isAvailable returns true.

@arnaudgeiser Interesting :thinking: For me it returned false in this situation. Just tried it out again by starting a REPL in the aleph repo without this patch and sure enough:

user> (io.netty.channel.epoll.Epoll/isAvailable)
false

While with the patch:

user> (io.netty.channel.epoll.Epoll/isAvailable)
true

Could you inspect your dependency tree whether something somewhere perhaps pulls in this dependency for you?

Do you have some hints about when using the classifier is necessary?

AFAIUI this is always necessary. I got that info from https://netty.io/wiki/native-transports.html#using-the-native-transports

A small aside: I'm not sure it should be Aleph's responsibility to include this dependency... but what's done is done.

Agreed, I also feel it's a bit off here because this also pulls in the dependency for non-Linux OSes. Also, not all Linux is x86_64. I just did it this way eventually because there was this prior art patch :sweat_smile: An alternative approach could be to require users to bring their own dependency and then log a warning (or throw an exception?) when they pass :epoll? true but the dependency is missing... @KingMob @arnaudgeiser WDYT?

However, I wasn't a big fan of the code to append the netty version number, and I like this addition to handle classifiers even less. It's fancy for so little benefit.

Can you switch to boring lein deps? (The netty version var can be kept)

@KingMob Gladly!

arnaudgeiser commented 2 years ago

On the project I'm currently working on:

❯ lein sub deps :tree | grep epoll          
Tried to load org.slf4j/slf4j-api version 1.7.5 but 1.7.25 was already loaded.
Tried to load org.apache.httpcomponents/httpcore version 4.4.9 but 4.4.13 was already loaded.
Tried to load org.apache.httpcomponents/httpclient version 4.5.5 but 4.5.13 was already loaded.
Tried to load commons-codec version 1.10 but 1.15 was already loaded.
       [io.netty/netty-transport-native-epoll "4.1.63.Final"]
Possibly confusing dependencies found:
[nrepl "0.8.3" :exclusions [org.clojure/clojure]]
 overrides
[com.exoscale/mania "1.0.1" :exclusions [com.fasterxml.jackson.dataformat/jackson-dataformat-smile com.fasterxml.jackson.dataformat/jackson-dataformat-cbor]] -> [nrepl "0.9.0"]

Consider using these exclusions:
[com.exoscale/mania "1.0.1" :exclusions [com.fasterxml.jackson.dataformat/jackson-dataformat-smile com.fasterxml.jackson.dataformat/jackson-dataformat-cbor nrepl]]

       [io.netty/netty-transport-native-epoll "4.1.63.Final"]
 [io.netty/netty-transport-native-epoll "4.1.63.Final"]
         [io.netty/netty-transport-native-epoll "4.1.63.Final"]
Possibly confusing dependencies found:
[nrepl "0.8.3" :exclusions [org.clojure/clojure]]
 overrides
[com.exoscale/sos-worker "0.9.371-SNAPSHOT" :exclusions [com.google.guava/guava com.google.j2objc/j2objc-annotations]] -> [com.exoscale/mania "1.0.1" :exclusions [com.fasterxml.jackson.dataformat/jackson-dataformat-smile com.fasterxml.jackson.dataformat/jackson-dataformat-cbor]] -> [nrepl "0.9.0"]
 and
[com.exoscale/mania "1.0.1" :exclusions [com.fasterxml.jackson.dataformat/jackson-dataformat-smile com.fasterxml.jackson.dataformat/jackson-dataformat-cbor]] -> [nrepl "0.9.0"]

Consider using these exclusions:
[com.exoscale/sos-worker "0.9.371-SNAPSHOT" :exclusions [com.google.j2objc/j2objc-annotations com.google.guava/guava nrepl]]
[com.exoscale/mania "1.0.1" :exclusions [com.fasterxml.jackson.dataformat/jackson-dataformat-smile com.fasterxml.jackson.dataformat/jackson-dataformat-cbor nrepl]]

 [io.netty/netty-transport-native-epoll "4.1.63.Final"]
(io.netty.channel.epoll.Epoll/isAvailable)  => true
(io.netty.channel.epoll.Epoll/unavailabilityCause) => nil

Sounds like we got the native libraries from [io.grpc/grpc-netty-shaded "1.36.0"] (they are located on META-INF).

Agreed, I also feel it's a bit off here because this also pulls in the dependency for non-Linux OSes. Also, not all Linux is x86_64. I just did it this way eventually because there was this prior art patch sweat_smile An alternative approach could be to require users to bring their own dependency and then log a warning (or throw an exception?) when they pass :epoll? true but the dependency is missing... @KingMob @arnaudgeiser WDYT?

If it's up to me, I would throw an Exception as soon as the transport is not available and provides additional informations about how to get the necessary dependencies.

Eventually, I think we should switch from :epoll? true to :transport :epoll. That way, we can easily provide additional transports (:kqueue, :io-uring)

That's unlikely anyone expect the system to work when specifying :epoll? true while not available. We ran into this issue and it was annoying.

So for me we should throw an Exception with the context and the cause (io.netty.channel.epoll.Epoll/unavailabilityCause). In a sense, it's a breaking change, but an excepted one.

WDYT?

EDIT: And yes, I would not provide the dependency and marks it as :provided (Aleph needs to compile)

KingMob commented 2 years ago

Re :epoll? true to :transport :epoll; it's better, but let's keep the old option working, and just remove it from the docstrings.

Forcing people to bring their own transport dep and throwing if unavailable might be acceptable, as long as the error message appears fast and has complete instructions on how to fix, but it's still a breaking change. On the other hand...the epoll jar file is 37Kb, so does it really matter? The Kqueue jar is 108kb. Maybe we should be adding MORE transport jars, not removing them. Are there any other reasons to remove it?

DerGuteMoritz commented 2 years ago

Re :epoll? true to :transport :epoll; it's better, but let's keep the old option working, and just remove it from the docstrings.

Generally agree but I suggest to defer this until at least one more transport option is added (e.g. if kqueue is ever included).

Forcing people to bring their own transport dep and throwing if unavailable might be acceptable, as long as the error message appears fast and has complete instructions on how to fix, but it's still a breaking change.

This would happen synchronously when calling start-server right here instead of silently falling back to the NIO transport.

On the other hand...the epoll jar file is 37Kb, so does it really matter? The Kqueue jar is 108kb. Maybe we should be adding MORE transport jars, not removing them. Are there any other reasons to remove it?

Sure, that would also be a viable approach :+1: Of course, it could still happen that somebody tries to enable epoll on a platform which doesn't support it so blowing up in this case (as discussed above) would still make sense.

So if you agree, I'll make it throw when epoll is requested but not available and add all known epoll providers as dependencies (linux-aarch_64 appears to be the only other one).

arnaudgeiser commented 2 years ago

I'm fine with both approaches. The only thing is that you could end up with multiple JARs providing you the same shared objects. On our case, both Aleph/Netty and GRPC provide us those objects and we actually don't know which ones are in use.

Anyway, the absence of those JARs are not the only reason why Epoll might be unavailable. Sometimes, it's just due to dependencies mismatches between Netty artifacts.

Let's include more JARs then, but I'm strongly in favor of throwing an Exception when the transport is not available.

DerGuteMoritz commented 2 years ago

The only thing is that you could end up with multiple JARs providing you the same shared objects.

@arnaudgeiser This is outside of Aleph's scope though, right?

So if you agree, I'll make it throw when epoll is requested but not available and add all known epoll providers as dependencies

Pushed two commits to that efect.

arnaudgeiser commented 2 years ago

@arnaudgeiser This is outside of Aleph's scope though, right?

I can't make up my mind about it. When some people will bump Aleph, they might start loading the objects from the Aleph's dependencies instead of what they used previously (in our case GRPC..). Throwing an Exception in case of conflicting shared objects (when Epoll is not available) is enough for me to be comfortable with the changes.

KingMob commented 2 years ago

@DerGuteMoritz @arnaudgeiser This broke a test on my mac. aleph.udp-test/test-echo sets :epoll? true, which is now an error.

is there any reason you can think of that the test should mandate epoll? If not, I'll remove it.

arnaudgeiser commented 2 years ago

@DerGuteMoritz @arnaudgeiser This broke a test on my mac. aleph.udp-test/test-echo sets :epoll? true, which is now an error.

I don't think so, you can remove it. However, it would be nice to run the tests over other platforms. If we ever want to support kqueue transport, would be great to validate it can be used. (and epoll is not available when kqueue is available => Mac)

DerGuteMoritz commented 2 years ago

is there any reason you can think of that the test should mandate epoll? If not, I'll remove it.

I don't think there is a good reason. Apparently it was added with 8f3e5b00a793d78a36ddb4b24c3098848dfdebf1 but it seems more like an accidentally added hunk during a late night hacking session or something, pretty clearly unrelated to the commit message.

An alternative to removing it could be to set it to (netty/epoll-available?) instead. Then again, a more principled approach to testing epoll support would be more desirable.

KingMob commented 2 years ago

Unless we're testing transports themselves, tests shouldn't depend on any particular transport