dakrone / clj-http

An idiomatic clojure http client wrapping the apache client. Officially supported version.
http://clojars.org/clj-http
MIT License
1.76k stars 406 forks source link

Default synchronous connection pool blocks forever? #407

Open ejschoen opened 6 years ago

ejschoen commented 6 years ago

We've been using clj-http version 3.5 for a while, and are very happy with it. But today, we ran into an issue we haven't seen before. A specific pattern of requests--I can't figure out yet how to isolate this into a unit test--hangs, in what appears to be a wait for a connection to become available from the PoolingHttpClientConnectionManager.

The thread's stack looks like this (with some elision--I am using default middleware):

        java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000f0ce9c70> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
        at org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking(AbstractConnPool.java:377)

       ...

        at clj_http.core$request.invoke(core.clj:412)
        at clj_http.core$request.invoke(core.clj:347)
        at clojure.lang.Var.invoke(Var.java:379)
        at clj_http.client$wrap_request_timing$fn__3085.invoke(client.clj:1030)
        at clj_http.client$wrap_async_pooling$fn__3093.invoke(client.clj:1063)

      ...

        at clj_http.client$request_STAR_.invoke(client.clj:1165)
        at clj_http.client$get.doInvoke(client.clj:1171)

There are no other threads actively requesting network content.

I enabled pool logging, and here's the end of the console log from the pool:

DEBUG: (PoolingHttpClientConnectionManager.java:320)  Connection [id: 2][route: {}->http://bit.ly:80] can be kept alive indefinitely 
DEBUG: (PoolingHttpClientConnectionManager.java:326)  Connection released: [id: 2][route: {}->http://bit.ly:80][total kept alive: 3; route allocated: 1 of 2; total allocated: 3 of 10] 
DEBUG: (PoolingHttpClientConnectionManager.java:255)  Connection request: [route: {}->http://www.gov.mb.ca:80][total kept alive: 3; route allocated: 0 of 2; total allocated: 3 of 10] 
DEBUG: (PoolingHttpClientConnectionManager.java:288)  Connection leased: [id: 3][route: {}->http://www.gov.mb.ca:80][total kept alive: 3; route allocated: 1 of 2; total allocated: 4 of 10] 
DEBUG: (DefaultHttpClientConnectionOperator.java:139)  Connecting to www.gov.mb.ca/198.163.15.25:80 
DEBUG: (DefaultHttpClientConnectionOperator.java:146)  Connection established 192.168.0.200:48939<->198.163.15.25:80 
DEBUG: (LoggingManagedHttpClientConnection.java:88)  http-outgoing-3: set socket timeout to 5000 
DEBUG: (PoolingHttpClientConnectionManager.java:320)  Connection [id: 3][route: {}->http://www.gov.mb.ca:80] can be kept alive for 5.0 seconds 
DEBUG: (PoolingHttpClientConnectionManager.java:326)  Connection released: [id: 3][route: {}->http://www.gov.mb.ca:80][total kept alive: 4; route allocated: 1 of 2; total allocated: 4 of 10] 
DEBUG: (PoolingHttpClientConnectionManager.java:255)  Connection request: [route: {}->http://bit.ly:80][total kept alive: 4; route allocated: 1 of 2; total allocated: 4 of 10] 
DEBUG: (PoolingHttpClientConnectionManager.java:288)  Connection leased: [id: 2][route: {}->http://bit.ly:80][total kept alive: 3; route allocated: 1 of 2; total allocated: 4 of 10] 
DEBUG: (PoolingHttpClientConnectionManager.java:320)  Connection [id: 2][route: {}->http://bit.ly:80] can be kept alive indefinitely 
DEBUG: (PoolingHttpClientConnectionManager.java:326)  Connection released: [id: 2][route: {}->http://bit.ly:80][total kept alive: 4; route allocated: 1 of 2; total allocated: 4 of 10] 
DEBUG: (PoolingHttpClientConnectionManager.java:255)  Connection request: [route: {s}->https://bitly.com:443][total kept alive: 4; route allocated: 0 of 2; total allocated: 4 of 10] 
DEBUG: (PoolingHttpClientConnectionManager.java:288)  Connection leased: [id: 4][route: {s}->https://bitly.com:443][total kept alive: 4; route allocated: 1 of 2; total allocated: 5 of 10] 
DEBUG: (DefaultHttpClientConnectionOperator.java:139)  Connecting to bitly.com/67.199.248.15:443 
DEBUG: (DefaultHttpClientConnectionOperator.java:146)  Connection established 192.168.0.200:57998<->67.199.248.15:443 
DEBUG: (PoolingHttpClientConnectionManager.java:255)  Connection request: [route: {}->http://bit.ly:80][total kept alive: 4; route allocated: 1 of 2; total allocated: 5 of 10] 
DEBUG: (PoolingHttpClientConnectionManager.java:288)  Connection leased: [id: 2][route: {}->http://bit.ly:80][total kept alive: 3; route allocated: 1 of 2; total allocated: 5 of 10] 
DEBUG: (PoolingHttpClientConnectionManager.java:320)  Connection [id: 2][route: {}->http://bit.ly:80] can be kept alive indefinitely 
DEBUG: (PoolingHttpClientConnectionManager.java:326)  Connection released: [id: 2][route: {}->http://bit.ly:80][total kept alive: 4; route allocated: 1 of 2; total allocated: 5 of 10] 
DEBUG: (PoolingHttpClientConnectionManager.java:255)  Connection request: [route: {s}->https://bitly.com:443][total kept alive: 4; route allocated: 1 of 2; total allocated: 5 of 10] 
DEBUG: (PoolingHttpClientConnectionManager.java:288)  Connection leased: [id: 5][route: {s}->https://bitly.com:443][total kept alive: 4; route allocated: 2 of 2; total allocated: 6 of 10] 
DEBUG: (DefaultHttpClientConnectionOperator.java:139)  Connecting to bitly.com/67.199.248.15:443 
DEBUG: (DefaultHttpClientConnectionOperator.java:146)  Connection established 192.168.0.200:57999<->67.199.248.15:443 
DEBUG: (PoolingHttpClientConnectionManager.java:255)  Connection request: [route: {}->http://bit.ly:80][total kept alive: 4; route allocated: 1 of 2; total allocated: 6 of 10] 
DEBUG: (PoolingHttpClientConnectionManager.java:288)  Connection leased: [id: 2][route: {}->http://bit.ly:80][total kept alive: 3; route allocated: 1 of 2; total allocated: 6 of 10] 
DEBUG: (LoggingManagedHttpClientConnection.java:88)  http-outgoing-2: set socket timeout to 5000 
DEBUG: (PoolingHttpClientConnectionManager.java:320)  Connection [id: 2][route: {}->http://bit.ly:80] can be kept alive indefinitely 
DEBUG: (PoolingHttpClientConnectionManager.java:326)  Connection released: [id: 2][route: {}->http://bit.ly:80][total kept alive: 4; route allocated: 1 of 2; total allocated: 6 of 10] 
DEBUG: (PoolingHttpClientConnectionManager.java:255)  Connection request: [route: {s}->https://bitly.com:443][total kept alive: 4; route allocated: 2 of 2; total allocated: 6 of 10] 

I was able to work around the problem by wrapping the code related to this scenario with a (with-connection-pool ...) wrapper. I just wonder why this was necessary. Never seen this before. It isn't obvious that there's more than one outgoing request when the thread blocks indefinitely.

I do wonder about the Connection [id: 2][route: {}->http://bit.ly:80] can be kept alive indefinitely. Are these connections not being released?

For what it's worth, there was no difference if I tried clj-http 3.7.0.

Thanks, Eric

dakrone commented 6 years ago

Interesting, how are you invoking the http call? Are you passing in your own connection manager then?

ejschoen commented 6 years ago

I wasn't using an explicit connection manager, and the workaround works if I simply use a unique connection pool in the context of the call that was hanging. I'm wondering naively if this could be a redirection chain that's just too deep for the default pool.

dakrone commented 6 years ago

I'm wondering naively if this could be a redirection chain that's just too deep for the default pool.

If you disable redirection on the clj-http level does the problem still occur?

ejschoen commented 6 years ago

Good question. I can give that a try. I can't reproduce the behavior in isolation, but can implement my own redirection following to see if problem recurs when the system is used in normal production.

dakrone commented 6 years ago

@ejschoen have you had a chance to try this again?

ejschoen commented 6 years ago

Apologies. I haven't had time to try implementing my own redirection here. I did pull the latest 3.8.0-SNAPSHOT this morning, and am testing in a development branch (primarily to confirm that #416 is fixed), without the explicit connection pool workaround I described above.

Do you think any work you have done between 3.7.0 and now would have addressed this issue?

dakrone commented 6 years ago

Do you think any work you have done between 3.7.0 and now would have addressed this issue?

That is possible but I can't say for sure. I did have another question, can you share the code you're invoking clj-http with? (or a reasonable replica). Also, how are you consuming the body of the response?

ejschoen commented 6 years ago

Not sure this'll help, but abstractly, the code is used to scrape content from pages.

(let [request-headers (merge headers {"User-Agent" (str "i2kconnect/" @i2ksource-version)})
        page (assoc
                      (client/get url (merge
                                                   {:socket-timeout 15000 
                                                    :insecure? insecure?
                                                    :throw-exceptions false
                                                    :headers request-headers
                                                    :conn-timeout 15000
                                                    :cookie-store cookie-store}
                                                  (when as {:as as})))
                        :request-headers request-headers)]
 ...)

The as parameter is normally nil, insecure is normally false, and cookie-store is from (clj-http.cookies/cookie-store). So the body comes back as a string and we feed it into JSoup and snacktory when we get a good :status value.

dakrone commented 6 years ago

The as parameter is normally nil

When you say "normally", what is the other option? Something I was originally wondering about was whether the body could be returned as a stream that was not being consumed, leading to consuming all of the available connection pool.

The other weird this is that a Pooling connection manager would be used at all, looking at the code, if no connection pool is specified, a BasicHttpClientConnectionManager should be used instead:

https://github.com/dakrone/clj-http/blob/6574e0ac0143866c749ca31962e394a903b249ae/src/clj_http/conn_mgr.clj#L183-L192

Is it possible someone has bound *connection-manager* or is providing one to the request?

RutledgePaulV commented 6 years ago

This sounds kinda similar to the issues in https://github.com/dakrone/clj-http/issues/251. Basically you need to take special care to cleanup connections after the reading of a response is interrupted otherwise that response continues to be downloaded (consuming one of the available pool threads).

ejschoen commented 6 years ago

Apologies for not responding sooner. Two answers:

  1. I looked back at our code from when I originally reported this, and I can confirm that none of our code was binding *connection-manager* at that time.
    1. The :as parameter in the clj-http.client/get call is :stream when we are fetching an RSS feed. We pass the stream onto feedparser-clj's parse-feed function. So... if we get a bad feed (e.g., bad XML that can't parse) and feedparser-clj fails and throws an exception, does the cleanup that Paul mentioned above happen?
dakrone commented 6 years ago

So... if we get a bad feed (e.g., bad XML that can't parse) and feedparser-clj fails and throws an exception, does the cleanup that Paul mentioned above happen?

It does not, in that case, the stream would still be around and the connection pool can't close until the stream has been read, see:

https://github.com/dakrone/clj-http/blob/master/src/clj_http/core.clj#L347-L358

Where the connection pool is only closed when the stream is closed.

Does this still occur if you use with-open to handle closing the stream in the event an exception is thrown, such as:

(let [request-headers (merge headers {"User-Agent" (str "i2kconnect/" @i2ksource-version)})
      page (assoc
            (client/get url (merge
                             {:socket-timeout 15000
                              :insecure? insecure?
                              :throw-exceptions false
                              :headers request-headers
                              :conn-timeout 15000
                              :cookie-store cookie-store}
                             (when as {:as as})))
            :request-headers request-headers)]
  (with-open [stream (:body page)]
    (parse-feed stream))
ejschoen commented 6 years ago

This all makes sense, but I'm still worried that I haven't found the case that was causing the pool starvation.

Here is some test code (warn, error, etc are from com.taoensso/timbre):

(defn- show-cm
  [cm message]
  (let [stats (.getTotalStats cm)]
    (println "*****" message)
    (println "***** Available connections:" (.getAvailable stats))
    (println "***** Leased connections:" (.getLeased stats))
    (println "***** Max connections:" (.getMax stats))
    (println "***** Pending:" (.getPending stats))))

(defn test-rss-feed
  [feed-uri]
  (http-client/with-connection-pool {:timeout 5 :threads 4 :insecure? true :default-per-route 4}
    (try (let [cs (cookies/cookie-store)
               feed (try
                      (http-client/get feed-uri {:throw-exceptions false
                                                 :cookie-store cs
                                                 :insecure? true
                                                 :as :stream
                                                 :headers {"User-Agent" (str "i2kconnect/" @i2ksource-version)}})
                      (catch Exception e
                        {:status 500 :body (.getMessage e)}))
               feed-stream (when (= (:status feed) 200) (:body feed))]
           (if (nil? feed-stream)
             (do (error "No content for RSS feed returned.")
                 (error (format "Status %d, body %s" (:status feed) (:body feed))))
             (with-open [stream feed-stream]
               (let [cm conn/*connection-manager*
                     _ (show-cm cm "Before feedparser")
                     entries (try (:entries (feedparser/parse-feed stream))
                                  (catch java.lang.IllegalArgumentException e
                                    (error (format "Unable to parse feed: %s" (.getMessage e)))
                                    (error feed-stream)
                                    []))]
                 (info (format "Found %d entries in feed" (count entries)))
                 (doseq [entry entries
                         ;; to repro with this code, define fix-entry-url to be identity
                         :let [url (fix-entry-url entry)]]
                   (try (let [result (http-client/get url {:throw-exceptions false
                                                           :cookie-store cs
                                                           :headers {"User-Agent" (str "i2kconnect/" @i2ksource-version)}})]
                          (if (= 200 (:status result))
                            (info (format "%s: OK" url))
                            (warn (format "%s: Not OK, status %d" url (:status result)))))
                        (catch Exception e
                          (error (format "Unable to fetch entry at %s: %s" url (.getMessage e))))))))))
         (finally
           (show-cm conn/*connection-manager* "In cleanup")))))

I tried this with a feed that our system thinks is bad: http://feeds.feedburner.com/gigalaw Here's what got printed on the console:

***** Before feedparser
***** Available connections: 0
***** Leased connections: 1
***** Max connections: 4
***** Pending: 0
18-Mar-29 14:16:09 debian-monster ERROR [i2ksource.rss] - Unable to parse feed: Invalid document
18-Mar-29 14:16:09 debian-monster ERROR [i2ksource.rss] - clj_http.core.proxy$java.io.FilterInputStream$ff19274a@34414ffc
18-Mar-29 14:16:09 debian-monster INFO [i2ksource.rss] - Found 0 entries in feed
***** In cleanup
***** Available connections: 1
***** Leased connections: 0
***** Max connections: 4
***** Pending: 0

This is what I'd expect. The with-open scope closes the stream, and what was the leased connection from the pool becomes available. However, if I change the with-open to let, I get exactly the same output, even though nothing is closing the stream when the feedparser throws the illegal argument exception. I did look into the source for feedparser up through com.sun.syndication.io.WireFeedInput and I can't see any code along that call chain that closes the stream in the case of an exception. (I don't think I can test the actual proxy FilterInputStream that clj-http creates for being open to confirm this, can I?)

dakrone commented 6 years ago

This all makes sense, but I'm still worried that I haven't found the case that was causing the pool starvation.

Okay, let's try to get some logging from Apache, I've documented how to hook up the apache client's http logging at https://github.com/dakrone/clj-http/#logging can you give that a try with DEBUG logging turned on, that should give us intimate details about connection reuse, closing, etc for the connection pool and request objects.