clj-commons / aleph

Asynchronous streaming communication for Clojure - web server, web client, and raw TCP/UDP
http://aleph.io
MIT License
2.54k stars 241 forks source link

Using InputStream as :ssl-context :trust-store is buggy #728

Open DerGuteMoritz opened 3 weeks ago

DerGuteMoritz commented 3 weeks ago

Quoting the original report by @David-Ongaro from the addendum of https://github.com/clj-commons/aleph/issues/727:

As per the docs, instead of java.io.File instances, java.io.InputStream instances are also supported as keys for the :ssl-context map. But I can't figure out how this is supposed to be used, as I regularly get java.lang.IllegalArgumentException: Input stream does not contain valid certificates. exceptions. I.e., preparing a single request just works fine, but preparing them in quick succession may fail:

(def ssl-context {:trust-store (io/input-stream client-ca)})

(def pool (http/connection-pool {:connection-options {:ssl-context ssl-context}}))

(http/get "https://example.com" {:pool pool}) => #<Deferred@6fb6283a: :not-delivered>

[(http/get "https://example.com" {:pool pool}) (http/get "https://example.com" {:pool pool})] =>
[#<Deferred@58ac8429: Error printing return value (CertificateException) at io.netty.handler.ssl.PemReader/readCertificates (PemReader.java:114).
Error printing return value (CertificateException) at io.netty.handler.ssl.PemReader/readCertificates (PemReader.java:114).
found no certificates in input stream
            PemReader.java:  114  io.netty.handler.ssl.PemReader/readCertificates
           SslContext.java: 1263  io.netty.handler.ssl.SslContext/toX509Certificates
    SslContextBuilder.java:  276  io.netty.handler.ssl.SslContextBuilder/trustManager
                 netty.clj:  917  aleph.netty/eval23500/add-ssl-trust-manager!
                 netty.clj: 1026  aleph.netty/eval23500/ssl-client-context
                 netty.clj: 1193  aleph.netty/coerce-ssl-context
                 netty.clj: 1179  aleph.netty/coerce-ssl-context
                  core.clj: 2641  clojure.core/partial/fn
                client.clj:  699  aleph.http.client/client-ssl-context
                client.clj:  690  aleph.http.client/client-ssl-context
                client.clj:  799  aleph.http.client/http-connection
                client.clj:  752  aleph.http.client/http-connection
                  http.clj:  104  aleph.http/create-connection
                  http.clj:   97  aleph.http/create-connection
                  http.clj:  239  aleph.http/connection-pool/fn
                  flow.clj:   47  aleph.flow/instrumented-pool/reify
                 Pool.java:  273  io.aleph.dirigiste.Pool/addObject
                 Pool.java:  466  io.aleph.dirigiste.Pool/acquire
                  flow.clj:   74  aleph.flow/acquire/fn
                  flow.clj:   73  aleph.flow/acquire
                  flow.clj:   68  aleph.flow/acquire
                  http.clj:  377  aleph.http/eval30155/request/fn/fn
                  http.clj:  371  aleph.http/eval30155/request/fn
                  http.clj:  370  aleph.http/eval30155/request
                  http.clj:  481  aleph.http/req
                  http.clj:  477  aleph.http/req
                  core.clj: 2642  clojure.core/partial/fn

I didn't look into the implementation, but I suspect what's happening here is that when the first thread of the thread pool is initialized, it's exhausting the input-stream instance and this instance is reused during the initialization of a second thread. (At least that's what I hope is happening, since the alternative would be that each thread tries to reread the certificate on each request.)

So the question is, if this doesn't work, why is it even supported? But if this is indeed an issue, it probably should be handled in a separate ticket, since this behavior already applies to Aleph 0.6.4 and therefore can't be considered a regression.

KingMob commented 3 weeks ago

I don't recall ever using an InputStream, but as you can see here, Aleph doesn't do much with it by default.

It's interesting that they fail "in quick succession". Maybe there's some delayed init that's not triggered until needed, and then if two conns both try to setup the sslcontext/trust store, they end up racing on the InputStream, and one or both fail.

It looks like that happens on the client-side. It makes a new client context for each conn, which is necessary in case the context is actually just a map of options. And if you make a call slowly, the same conn will get reused, so it's not an issue there. But too fast, and it'll spawn multiple conns, corresponding multiple sslcontexts, and try to read from an exhausted stream or in the middle of an earlier conn.

Solution is to either (1) force the stream into another format ASAP, or (2) disallow streams.

DerGuteMoritz commented 2 weeks ago

@KingMob My reading of the code agrees with your analysis :+1: I'll come up with a test case to reproduce it.

It looks like that happens on the client-side. It makes a new client context for each conn, which is necessary in case the context is actually just a map of options

I think this might actually not be necessary: It should be possible to lift the construction of the context to the level of the pool instead which would solve this bug as well as reduce allocation. Will give this a try!

KingMob commented 2 weeks ago

Yeah, even when SslContext construction is idempotent, why do it multiple times?

bitti commented 2 weeks ago

Yeah, even when SslContext construction is idempotent, why do it multiple times?

If these instances are not thread-safe, that could be a reason. But since they are immutable, I suppose they also should be thread safe. Furthermore, I think the netty SslContext instances are based on the JDK SslContext implementations and if these weren't thread-safe it would be a widespread common problem (even though I find it hard to find explicit documentation about this).

KingMob commented 2 weeks ago

@bitti Good point. Though I've never really considered, should thread safety be implicit in the definition of "idempotent"? I assumed so, but we programmers are much looser about the definition than mathematicians.

bitti commented 2 weeks ago

@bitti Good point. Though I've never really considered, should thread safety be implicit in the definition of "idempotent"? I assumed so, but we programmers are much looser about the definition than mathematicians.

I think even in the mathematical sense, you can't 'define' it like that, since thread-unsafety implies undefined behavior. So no, neither idempotency nor immutability implies thread-safety.

But I think in this case we can safely assume the JDK/netty implementations are thread-safe since otherwise it would, it make it more difficult to share connections (at least that's what I gather from the SO discussions). I gather the reason why the JDK docs don't explicitly state this is because they can't make a guarantee for the millions of potential SslContext implementations out there.