clj-commons / aleph

Asynchronous streaming communication for Clojure - web server, web client, and raw TCP/UDP
http://aleph.io
MIT License
2.54k stars 241 forks source link

Proxy functionality broken in Aleph 0.7.0 and later #731

Open David-Ongaro opened 2 weeks ago

David-Ongaro commented 2 weeks ago

Since updating from 0.6.4 to 0.8.0 I see strange exceptions for proxied calls.

io.netty.handler.codec.DecoderException: io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 485454502f312e31203430302042616420526571756573740d0a636f6e74656e742d6c656e6774683a2031310d0a636f6e74656e742d747970653a20746578742f706c61696e0d0a646174653a205468752c203133204a756e20323032342032323a35353a323720474d540d0a7365727665723a20656e766f790d0a636f6e6e656374696f6e3a20636c6f73650d0a0d0a4261642052657175657374
    at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:500)
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1407)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:918)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:994)
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at manifold.executor$thread_factory$reify__1227$f__1228.invoke(executor.clj:71)
    at clojure.lang.AFn.run(AFn.java:22)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Unknown Source)
Caused by: io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 485454502f312e31203430302042616420526571756573740d0a636f6e74656e742d6c656e6774683a2031310d0a636f6e74656e742d747970653a20746578742f706c61696e0d0a646174653a205468752c203133204a756e20323032342032323a35353a323720474d540d0a7365727665723a20656e766f790d0a636f6e6e656374696f6e3a20636c6f73650d0a0d0a4261642052657175657374
    at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1314)
    at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1387)
    at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:530)
    at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:469)
    ... 19 common frames omitted

Decoding the hex string yields:

$ xxd -r -p <<<485454502f312e31203430302042616420526571756573740d0a636f6e74656e742d6c656e6774683a2031310d0a636f6e74656e742d747970653a20746578742f706c61696e0d0a646174653a205468752c203133204a756e20323032342032323a35353a323720474d540d0a7365727665723a20656e766f790d0a636f6e6e656374696f6e3a20636c6f73650d0a0d0a4261642052657175657374
HTTP/1.1 400 Bad Request
content-length: 11
content-type: text/plain
date: Thu, 13 Jun 2024 22:55:27 GMT
server: envoy
connection: close

Bad Request

So it seems it tries to interpret an http response as https. (I try to reach a https endpoint via an http proxy, but maybe it's making an unproxied http call instead while still expecting https.)

It's difficult for me, though, to reproduce the proxy settings we use in prod locally. Therefore, I will use a local proxy for this bug report here:

$ docker run --rm -p 3128:3128/tcp ghcr.io/tarampampam/3proxy
/bin/3proxy: Starting 3proxy
{"time_unix":1718326196, "proxy":{"type:":"PROXY", "port":3128}, "error":{"code":"00000"}, "auth":{"user":"-"}, "client":{"ip":"0.0.0.0", "port":3128}, "server":{"ip":"0.0.0.0", "port":0}, "bytes":{"sent":0, "received":0}, "request":{"hostname":"[0.0.0.0]"}, "message":"Accepting connections [17/4281620160]"}
{"time_unix":1718326196, "proxy":{"type:":"SOCKS", "port":1080}, "error":{"code":"00000"}, "auth":{"user":"-"}, "client":{"ip":"0.0.0.0", "port":1080}, "server":{"ip":"0.0.0.0", "port":0}, "bytes":{"sent":0, "received":0}, "request":{"hostname":"[0.0.0.0]"}, "message":"Accepting connections [17/4281566912]"}

After this, we can confirm that Aleph 0.6.4 works as expected

(def pool (http/connection-pool {:connection-options {:proxy-options {:host "localhost" :port 3128}}}))

@(http/get "http://example.com" {:pool pool}) =>
{:request-time 52,
 :aleph/keep-alive? true,
 :headers
 {"x-cache" "HIT",
  "server" "ECAcc (sed/58AF)",
  "age" "245573",
  "content-type" "text/html; charset=UTF-8",
  "content-length" "1256",
  "accept-ranges" "bytes",
  "expires" "Fri, 21 Jun 2024 00:16:36 GMT",
  "etag" "\"3147526947+gzip\"",
  "date" "Fri, 14 Jun 2024 00:16:36 GMT",
  "vary" "Accept-Encoding",
  "last-modified" "Thu, 17 Oct 2019 07:18:26 GMT",
  "cache-control" "max-age=604800"},
 :status 200,
 :connection-time 96,
 :body
 #object[java.io.ByteArrayInputStream 0x36c1143f "java.io.ByteArrayInputStream@36c1143f"]}

@(http/get "https://example.com" {:pool pool}) =>
{:request-time 21,
 :aleph/keep-alive? true,
 :headers
 {"x-cache" "HIT",
  "server" "ECAcc (sed/5885)",
  "age" "271652",
  "content-type" "text/html; charset=UTF-8",
  "content-length" "1256",
  "expires" "Fri, 21 Jun 2024 00:16:40 GMT",
  "etag" "\"3147526947+gzip+ident\"",
  "date" "Fri, 14 Jun 2024 00:16:40 GMT",
  "vary" "Accept-Encoding",
  "last-modified" "Thu, 17 Oct 2019 07:18:26 GMT",
  "cache-control" "max-age=604800"},
 :status 200,
 :connection-time 194,
 :body
 #object[java.io.ByteArrayInputStream 0x2f625fd5 "java.io.ByteArrayInputStream@2f625fd5"]}

Trying the same in 0.7.0 or higher yields

@(http/get "https://example.com" {:pool pool}) =>
Execution error (ExceptionInfo) at aleph.http.client/http-connection$setup-client$http-req-fn (client.clj:913).
Channel is inactive/closed.
   {:req
    {:path "/",
     :user-info nil,
     :pool
     #object[io.aleph.dirigiste.Pool 0x124e951f "io.aleph.dirigiste.Pool@124e951f"],
     :request-url "https://example.com",
     :server-port nil,
     :uri "/",
     :server-name "example.com",
     :query-string nil,
     :scheme :https,
     :request-method :get},
    :ch NioSocketChannel [8efb471b]():
    Handlers:
   ,
    :open? false,
    :active? false}
                client.clj:  913  aleph.http.client/http-connection/setup-client/http-req-fn
     client_middleware.clj:  263  aleph.http.client-middleware/wrap-exceptions/fn/fn
                  AFn.java:  154  clojure.lang.AFn/applyToHelper
                  AFn.java:  144  clojure.lang.AFn/applyTo
                  core.clj:  667  clojure.core/apply
                  core.clj: 1990  clojure.core/with-bindings*
                  core.clj: 1990  clojure.core/with-bindings*
               RestFn.java:  142  clojure.lang.RestFn/applyTo
                  core.clj:  671  clojure.core/apply
                  core.clj: 2020  clojure.core/bound-fn*/fn
               RestFn.java:  408  clojure.lang.RestFn/invoke
              deferred.clj:  882  manifold.deferred/eval17318/chain'-
     client_middleware.clj:  263  aleph.http.client-middleware/wrap-exceptions/fn
     client_middleware.clj:  598  aleph.http.client-middleware/wrap-request-timing/fn
     client_middleware.clj:  976  aleph.http.client-middleware/wrap-request/fn
                  http.clj:  403  aleph.http/eval28275/request/fn/fn/fn
              deferred.clj:  900  manifold.deferred/eval17318/chain'-
              deferred.clj:  854  manifold.deferred/eval17318/subscribe/fn
              deferred.clj:  344  manifold.deferred.Listener/onSuccess
              deferred.clj:  523  manifold.deferred.Deferred/fn/fn
                  AFn.java:   22  clojure.lang.AFn/run
             Executor.java:  320  io.aleph.dirigiste.Executor$3/run
             Executor.java:   62  io.aleph.dirigiste.Executor$Worker$1/run
              executor.clj:   70  manifold.executor/thread-factory/reify/f

Or alternatively, after a timeout of a minute or so:

@(http/get "https://example.com" {:pool pool}) =>

Unhandled io.netty.handler.proxy.ProxyConnectException
   http, none, localhost/127.0.0.1:3128 => null, timeout
         ProxyHandler.java:  201  io.netty.handler.proxy.ProxyHandler$2/run
          PromiseTask.java:   98  io.netty.util.concurrent.PromiseTask/runTask
  ScheduledFutureTask.java:  153  io.netty.util.concurrent.ScheduledFutureTask/run
AbstractEventExecutor.java:  173  io.netty.util.concurrent.AbstractEventExecutor/runTask
AbstractEventExecutor.java:  166  io.netty.util.concurrent.AbstractEventExecutor/safeExecute
SingleThreadEventExecutor.java:  469  io.netty.util.concurrent.SingleThreadEventExecutor/runAllTasks
         NioEventLoop.java:  569  io.netty.channel.nio.NioEventLoop/run
SingleThreadEventExecutor.java:  994  io.netty.util.concurrent.SingleThreadEventExecutor$4/run
    ThreadExecutorMap.java:   74  io.netty.util.internal.ThreadExecutorMap$2/run
              executor.clj:   70  manifold.executor/thread-factory/reify/f
                  AFn.java:   22  clojure.lang.AFn/run
FastThreadLocalRunnable.java:   30  io.netty.util.concurrent.FastThreadLocalRunnable/run
               Thread.java: 1583  java.lang.Thread/run

Trying http yields a strange 503:

@(http/get "http://example.com" {:pool pool})
Execution error (ExceptionInfo) at aleph.http.client-middleware/wrap-exceptions$fn$fn$fn (client_middleware.clj:275).
status: 503
   {:aleph/keep-alive? false,
    :aleph/destroy-conn? #<SuccessDeferred@5a793212: false>,
    :headers
    {"content-type" "text/html; charset=UTF-8",
     "content-length" "1284",
     "connection" "close",
     "pragma" "no-cache",
     "expires" "Thu, 01 Jan 1970 00:00:00 GMT",
     "p3p" "CP=\"CAO PSA OUR\"",
     "cache-control"
     "no-store, no-cache, must-revalidate, post-check=0, pre-check=0"},
    :status 503,
    :body
    #object[java.io.ByteArrayInputStream 0x77490577 "java.io.ByteArrayInputStream@77490577"]}
     client_middleware.clj:  275  aleph.http.client-middleware/wrap-exceptions/fn/fn/fn
              deferred.clj:  882  manifold.deferred/eval17318/chain'-
              deferred.clj:  848  manifold.deferred/eval17318/subscribe/fn
              deferred.clj:  344  manifold.deferred.Listener/onSuccess
              deferred.clj:  523  manifold.deferred.Deferred/fn
              deferred.clj:  523  manifold.deferred.Deferred/success
              deferred.clj:  368  manifold.deferred/success!
              deferred.clj:  365  manifold.deferred/success!
     client_middleware.clj:  272  aleph.http.client-middleware/wrap-exceptions/fn/fn/f--auto--
                  AFn.java:   22  clojure.lang.AFn/run
             Executor.java:  320  io.aleph.dirigiste.Executor$3/run
             Executor.java:   62  io.aleph.dirigiste.Executor$Worker$1/run
              executor.clj:   70  manifold.executor/thread-factory/reify/f

If I try https afterwards it seems it still tries to intepret the result as http:

@(http/get "https://example.com" {:pool pool}) =>
Execution error (IllegalArgumentException) at io.netty.handler.codec.http.HttpVersion/<init> (HttpVersion.java:123).
invalid version format: Ú¶•&¿IÐQ¡_QIQ©ÎÜÕCÛDÁÎ6Ó§L³K‹·‘¹}§¾¿¶.¯}¦KSSǶ'}ÇÊÂ͈0I\…R“ÊJ:џڭ‚/CY̱Þ
          HttpVersion.java:  123  io.netty.handler.codec.http.HttpVersion/<init>
          HttpVersion.java:   85  io.netty.handler.codec.http.HttpVersion/valueOf
  HttpResponseDecoder.java:  194  io.netty.handler.codec.http.HttpResponseDecoder/createMessage
    HttpObjectDecoder.java:  354  io.netty.handler.codec.http.HttpObjectDecoder/decode
      HttpClientCodec.java:  320  io.netty.handler.codec.http.HttpClientCodec$Decoder/decode
 ByteToMessageDecoder.java:  530  io.netty.handler.codec.ByteToMessageDecoder/decodeRemovalReentryProtection
 ByteToMessageDecoder.java:  469  io.netty.handler.codec.ByteToMessageDecoder/callDecode
 ByteToMessageDecoder.java:  290  io.netty.handler.codec.ByteToMessageDecoder/channelRead
CombinedChannelDuplexHandler.java:  251  io.netty.channel.CombinedChannelDuplexHandler/channelRead
     HttpProxyHandler.java:  284  io.netty.handler.proxy.HttpProxyHandler$HttpClientCodecWrapper/channelRead
AbstractChannelHandlerContext.java:  444  io.netty.channel.AbstractChannelHandlerContext/invokeChannelRead
AbstractChannelHandlerContext.java:  420  io.netty.channel.AbstractChannelHandlerContext/invokeChannelRead
AbstractChannelHandlerContext.java:  412  io.netty.channel.AbstractChannelHandlerContext/fireChannelRead
       LoggingHandler.java:  280  io.netty.handler.logging.LoggingHandler/channelRead
AbstractChannelHandlerContext.java:  442  io.netty.channel.AbstractChannelHandlerContext/invokeChannelRead
AbstractChannelHandlerContext.java:  420  io.netty.channel.AbstractChannelHandlerContext/invokeChannelRead
AbstractChannelHandlerContext.java:  412  io.netty.channel.AbstractChannelHandlerContext/fireChannelRead
DefaultChannelPipeline.java: 1407  io.netty.channel.DefaultChannelPipeline$HeadContext/channelRead
AbstractChannelHandlerContext.java:  440  io.netty.channel.AbstractChannelHandlerContext/invokeChannelRead
AbstractChannelHandlerContext.java:  420  io.netty.channel.AbstractChannelHandlerContext/invokeChannelRead
DefaultChannelPipeline.java:  918  io.netty.channel.DefaultChannelPipeline/fireChannelRead
AbstractNioByteChannel.java:  166  io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe/read
         NioEventLoop.java:  788  io.netty.channel.nio.NioEventLoop/processSelectedKey
         NioEventLoop.java:  724  io.netty.channel.nio.NioEventLoop/processSelectedKeysOptimized
         NioEventLoop.java:  650  io.netty.channel.nio.NioEventLoop/processSelectedKeys
         NioEventLoop.java:  562  io.netty.channel.nio.NioEventLoop/run
SingleThreadEventExecutor.java:  994  io.netty.util.concurrent.SingleThreadEventExecutor$4/run
    ThreadExecutorMap.java:   74  io.netty.util.internal.ThreadExecutorMap$2/run
              executor.clj:   70  manifold.executor/thread-factory/reify/f
                  AFn.java:   22  clojure.lang.AFn/run
FastThreadLocalRunnable.java:   30  io.netty.util.concurrent.FastThreadLocalRunnable/run
               Thread.java: 1583  java.lang.Thread/run

although, I sometimes can get this error right away, without trying http first, so it doesn't seem to be a matter of misusing a pool for different protocols (which we don't do in prod anyway).

So even though I can not reproduce the exact error I see in prod locally, I think we can conclude that there is a regression in 0.7.0 and later compared to 0.6.4.

KingMob commented 2 weeks ago

Hmmm. Do you see an IllegalArgumentException anywhere in the logs that says "Proxying HTTP/2 messages not supported yet"?

I wonder if it's negotiating HTTP/2 and burying the exception somehow. I didn't think I changed anything with the HTTP/1 proxying; if so, that's an error. But I never used proxying, and never did much with it. It hasn't been implemented for HTTP/2 yet.

David-Ongaro commented 2 weeks ago

Hmmm. Do you see an IllegalArgumentException anywhere in the logs that says "Proxying HTTP/2 messages not supported yet"?

No, I don't see such a message. I also don't think it's using HTTP/2, but to be sure: is there a pool config to force HTTP/1.1? Then I can retry with that.

But I never used proxying, and never did much with it.

Still, you can try to reproduce the errors with a local proxy, like I showed above.

KingMob commented 2 weeks ago

HTTP2 is explicitly opt-in, so unless you're setting it as a desired http version, it's http1 by default.

But I never used proxying, and never did much with it.

Still, you can try to reproduce the errors with a local proxy, like I showed above.

Oh, that sentence is more a note for Moritz, who's taken over as maintainer, that I changed virtually nothing with the proxy code itself, so the error probably lies elsewhere, like in the refactoring to support HTTP2, or the code path into the proxy code.