Closed DerGuteMoritz closed 2 years ago
We ran exactly into this issue few months/years ago.
I just double-checked and it seems we are not using :classifier "linux-x86_64"
while Epoll/isAvailable
returns true.
Do you have some hints about when using the classifier
is necessary?
Otherwise, this change is fine for me. I included many times the :classifier "linux-x86_64"
on personal projects exactly because of this.
A small aside: I'm not sure it should be Aleph's responsibility to include this dependency... but what's done is done.
I'll defer to you two on this. I have no experience messing with epoll.
However, I wasn't a big fan of the code to append the netty version number, and I like this addition to handle classifiers even less. It's fancy for so little benefit.
Can you switch to boring lein deps? (The netty version var can be kept)
We ran exactly into this issue few months/years ago. I just double-checked and it seems we are not using
:classifier "linux-x86_64"
whileEpoll/isAvailable
returns true.
@arnaudgeiser Interesting :thinking: For me it returned false
in this situation. Just tried it out again by starting a REPL in the aleph repo without this patch and sure enough:
user> (io.netty.channel.epoll.Epoll/isAvailable)
false
While with the patch:
user> (io.netty.channel.epoll.Epoll/isAvailable)
true
Could you inspect your dependency tree whether something somewhere perhaps pulls in this dependency for you?
Do you have some hints about when using the
classifier
is necessary?
AFAIUI this is always necessary. I got that info from https://netty.io/wiki/native-transports.html#using-the-native-transports
A small aside: I'm not sure it should be Aleph's responsibility to include this dependency... but what's done is done.
Agreed, I also feel it's a bit off here because this also pulls in the dependency for non-Linux OSes. Also, not all Linux is x86_64
. I just did it this way eventually because there was this prior art patch :sweat_smile: An alternative approach could be to require users to bring their own dependency and then log a warning (or throw an exception?) when they pass :epoll? true
but the dependency is missing... @KingMob @arnaudgeiser WDYT?
However, I wasn't a big fan of the code to append the netty version number, and I like this addition to handle classifiers even less. It's fancy for so little benefit.
Can you switch to boring lein deps? (The netty version var can be kept)
@KingMob Gladly!
On the project I'm currently working on:
❯ lein sub deps :tree | grep epoll
Tried to load org.slf4j/slf4j-api version 1.7.5 but 1.7.25 was already loaded.
Tried to load org.apache.httpcomponents/httpcore version 4.4.9 but 4.4.13 was already loaded.
Tried to load org.apache.httpcomponents/httpclient version 4.5.5 but 4.5.13 was already loaded.
Tried to load commons-codec version 1.10 but 1.15 was already loaded.
[io.netty/netty-transport-native-epoll "4.1.63.Final"]
Possibly confusing dependencies found:
[nrepl "0.8.3" :exclusions [org.clojure/clojure]]
overrides
[com.exoscale/mania "1.0.1" :exclusions [com.fasterxml.jackson.dataformat/jackson-dataformat-smile com.fasterxml.jackson.dataformat/jackson-dataformat-cbor]] -> [nrepl "0.9.0"]
Consider using these exclusions:
[com.exoscale/mania "1.0.1" :exclusions [com.fasterxml.jackson.dataformat/jackson-dataformat-smile com.fasterxml.jackson.dataformat/jackson-dataformat-cbor nrepl]]
[io.netty/netty-transport-native-epoll "4.1.63.Final"]
[io.netty/netty-transport-native-epoll "4.1.63.Final"]
[io.netty/netty-transport-native-epoll "4.1.63.Final"]
Possibly confusing dependencies found:
[nrepl "0.8.3" :exclusions [org.clojure/clojure]]
overrides
[com.exoscale/sos-worker "0.9.371-SNAPSHOT" :exclusions [com.google.guava/guava com.google.j2objc/j2objc-annotations]] -> [com.exoscale/mania "1.0.1" :exclusions [com.fasterxml.jackson.dataformat/jackson-dataformat-smile com.fasterxml.jackson.dataformat/jackson-dataformat-cbor]] -> [nrepl "0.9.0"]
and
[com.exoscale/mania "1.0.1" :exclusions [com.fasterxml.jackson.dataformat/jackson-dataformat-smile com.fasterxml.jackson.dataformat/jackson-dataformat-cbor]] -> [nrepl "0.9.0"]
Consider using these exclusions:
[com.exoscale/sos-worker "0.9.371-SNAPSHOT" :exclusions [com.google.j2objc/j2objc-annotations com.google.guava/guava nrepl]]
[com.exoscale/mania "1.0.1" :exclusions [com.fasterxml.jackson.dataformat/jackson-dataformat-smile com.fasterxml.jackson.dataformat/jackson-dataformat-cbor nrepl]]
[io.netty/netty-transport-native-epoll "4.1.63.Final"]
(io.netty.channel.epoll.Epoll/isAvailable) => true
(io.netty.channel.epoll.Epoll/unavailabilityCause) => nil
Sounds like we got the native libraries from [io.grpc/grpc-netty-shaded "1.36.0"]
(they are located on META-INF).
Agreed, I also feel it's a bit off here because this also pulls in the dependency for non-Linux OSes. Also, not all Linux is x86_64. I just did it this way eventually because there was this prior art patch sweat_smile An alternative approach could be to require users to bring their own dependency and then log a warning (or throw an exception?) when they pass :epoll? true but the dependency is missing... @KingMob @arnaudgeiser WDYT?
If it's up to me, I would throw an Exception as soon as the transport is not available and provides additional informations about how to get the necessary dependencies.
Eventually, I think we should switch from :epoll? true
to :transport :epoll
.
That way, we can easily provide additional transports (:kqueue, :io-uring)
That's unlikely anyone expect the system to work when specifying :epoll? true
while not available.
We ran into this issue and it was annoying.
So for me we should throw an Exception with the context and the cause (io.netty.channel.epoll.Epoll/unavailabilityCause)
.
In a sense, it's a breaking change, but an excepted one.
WDYT?
EDIT: And yes, I would not provide the dependency and marks it as :provided
(Aleph needs to compile)
Re :epoll? true
to :transport :epoll
; it's better, but let's keep the old option working, and just remove it from the docstrings.
Forcing people to bring their own transport dep and throwing if unavailable might be acceptable, as long as the error message appears fast and has complete instructions on how to fix, but it's still a breaking change. On the other hand...the epoll jar file is 37Kb, so does it really matter? The Kqueue jar is 108kb. Maybe we should be adding MORE transport jars, not removing them. Are there any other reasons to remove it?
Re
:epoll? true
to:transport :epoll
; it's better, but let's keep the old option working, and just remove it from the docstrings.
Generally agree but I suggest to defer this until at least one more transport option is added (e.g. if kqueue is ever included).
Forcing people to bring their own transport dep and throwing if unavailable might be acceptable, as long as the error message appears fast and has complete instructions on how to fix, but it's still a breaking change.
This would happen synchronously when calling start-server
right here instead of silently falling back to the NIO transport.
On the other hand...the epoll jar file is 37Kb, so does it really matter? The Kqueue jar is 108kb. Maybe we should be adding MORE transport jars, not removing them. Are there any other reasons to remove it?
Sure, that would also be a viable approach :+1: Of course, it could still happen that somebody tries to enable epoll on a platform which doesn't support it so blowing up in this case (as discussed above) would still make sense.
So if you agree, I'll make it throw when epoll is requested but not available and add all known epoll providers as dependencies (linux-aarch_64
appears to be the only other one).
I'm fine with both approaches.
The only thing is that you could end up with multiple JARs providing you the same shared objects
.
On our case, both Aleph/Netty
and GRPC
provide us those objects and we actually don't know which ones are in use.
Anyway, the absence of those JARs are not the only reason why Epoll
might be unavailable. Sometimes, it's just due to dependencies mismatches between Netty artifacts.
Let's include more JARs then, but I'm strongly in favor of throwing an Exception when the transport is not available.
The only thing is that you could end up with multiple JARs providing you the same shared objects.
@arnaudgeiser This is outside of Aleph's scope though, right?
So if you agree, I'll make it throw when epoll is requested but not available and add all known epoll providers as dependencies
Pushed two commits to that efect.
@arnaudgeiser This is outside of Aleph's scope though, right?
I can't make up my mind about it. When some people will bump Aleph, they might start loading the objects from the Aleph's dependencies instead of what they used previously (in our case GRPC..). Throwing an Exception in case of conflicting shared objects (when Epoll is not available) is enough for me to be comfortable with the changes.
@DerGuteMoritz @arnaudgeiser This broke a test on my mac. aleph.udp-test/test-echo
sets :epoll? true
, which is now an error.
is there any reason you can think of that the test should mandate epoll? If not, I'll remove it.
@DerGuteMoritz @arnaudgeiser This broke a test on my mac. aleph.udp-test/test-echo sets :epoll? true, which is now an error.
I don't think so, you can remove it.
However, it would be nice to run the tests over other platforms.
If we ever want to support kqueue
transport, would be great to validate
it can be used. (and epoll
is not available when kqueue
is available => Mac)
is there any reason you can think of that the test should mandate epoll? If not, I'll remove it.
I don't think there is a good reason. Apparently it was added with 8f3e5b00a793d78a36ddb4b24c3098848dfdebf1 but it seems more like an accidentally added hunk during a late night hacking session or something, pretty clearly unrelated to the commit message.
An alternative to removing it could be to set it to (netty/epoll-available?)
instead. Then again, a more principled approach to testing epoll support would be more desirable.
Unless we're testing transports themselves, tests shouldn't depend on any particular transport
Fixes #475.
This is a cherry-pick of 5f5e73b256027d3d887582118af86b580eb08b0c from the
1.0.0
branch minus thekqueue
-related changes.