babashka / http-client

HTTP client for Clojure and Babashka built on java.net.http
MIT License
121 stars 13 forks source link

Fails to perform large number of concurrent asynchronous requests using Babashka v1.3.181 REPL. #37

Closed bren-do closed 1 year ago

bren-do commented 1 year ago

I've been playing with the http-client library to see if I can leverage babashka for scripting. I wanted to see if http-client could handle the same number of concurrent asynchronous requests as the org.httpkit.client library used by babashka, because it seems like http-client is the most likely to be supported long-term.

In my testing I've noticed that http-client cannot handle as many concurrent asynchronous requests as httpkit. In this test, I do 20000 concurrent http requests to httpstat.us where the server will sleep for 2000 msecs, and return status 200.

In httpkit's case, since the requests' IO is non-blocking, we can perform more concurrent requests than available threads on my machine (around 4096).

#!/usr/bin/env bb

(require '[org.httpkit.client :as http])

(defn api-call-async!
  ([x]
   (api-call-async! 2000 x))
  ([sleep x]
   [x (http/get (str "https://httpstat.us/200?sleep=" sleep))]))

(time (->> (range 20000)
           (mapv api-call-async!)
           (mapv (fn [[x p]]
                   (deref p)
                   x))))

This code ran in about 230 seconds when using the http kit library.

Functionally similar code when run with http-client never returns a value or throws an exception, and I need to restart my machine (killing the repl doesn't seem to work, even with kill -9). I feel like this code (below) should be able to run similarly to the example (above) that leverages httpkit, considering that CompletableFuture is meant to be non-blocking.

#!/usr/bin/env bb

(require '[babashka.http-client :as http])

(defn api-call-async!
  ([x]
   (api-call-async! 2000 x))
  ([sleep x]
   [x (http/get (str "https://httpstat.us/200?sleep=" sleep) {:async true})]))

(time (->> (range 20000)
           (mapv api-call-async!)
           (mapv (fn [[x p]]
                   (deref p)
                   x))))

If this use case isn't a priority for http-client, can we expect long term support for httpkit?

borkdude commented 1 year ago

Can you confirm that the behavior is the same or differs in a JVM with bb.http-client?

You can configure an :executor for your client, perhaps that this helps:

(require '[babashka.http-client :as http])

(def client (http/client (assoc http/default-client-opts :executor (java.util.concurrent.Executors/newVirtualThreadPerTaskExecutor))))

(defn api-call-async!
  ([x]
   (api-call-async! 2000 x))
  ([sleep x]
   [x (http/get (str "https://httpstat.us/200?sleep=" sleep) {:async true :client client})]))

(time (->> (range 20000)
           (mapv api-call-async!)
           (mapv (fn [[x p]]
                   (deref p)
                   x))))

I wouldn't know why bb.http-client would be slower since it just relies on the java.net.http classes but it could be that there is a bug somewhere.

bren-do commented 1 year ago

Can you confirm that the behavior is the same or differs in a JVM with bb.http-client?

Do you mean running the library on a JVM repl as opposed to a BB one?

Your above code ran on bb but there was no difference in outcome. I wasn't able to require the library in a JVM repl, compiling a namespace requiring it returned a CompilerException Caused by: java.lang.Exception: namespace 'babashka.http-client.internal.multipart' not found

borkdude commented 1 year ago

"Do you mean running the library on a JVM repl as opposed to a BB one?" yes

I wasn't able to require the library in a JVM repl,

Huh? Can you give a repro of that?

bren-do commented 1 year ago

Huh? Can you give a repro of that?

Oops! I was using a pre-1.11.0 clojure version and overzealously recompiling. Testing now.

bren-do commented 1 year ago

Behaviour is the same on the JVM with or without the newVirtualThreadPerTaskExecutor (Also I needed to pass in --enable-preview in the jvm-opts to include that executor). I once again needed to restart my machine to get my java process to die.

borkdude commented 1 year ago

As a last resort, you could try to do this using raw JVM interop and the java.net.http classes to see if there's a bug in bb.http-client. Or try hato for example, which is very similar.

bren-do commented 1 year ago

Hato seems to be running each request in a separate thread. Going up by powers of 2, hato threw a java.lang.OutOfMemoryError : unable to create new native Thread error at 2048 concurrent requests.

I also put together a minimal implementation using mostly interop with java.net.http (and cribbing from your internal.clj code). This was running on a JVM repl.

(def default-client (delay {:client (-> (HttpClient/newBuilder)
                                        (.executor (java.util.concurrent.Executors/newVirtualThreadPerTaskExecutor))
                                        .build)}))

(defn then [x f]
  (.thenApply ^java.util.concurrent.CompletableFuture x
              ^java.util.function.Function
              (reify java.util.function.Function
                (apply [_ args]
                  (f args)))))

(defn request
  [{:keys [client uri] :as req}]
  (let [client (or client @default-client)
        ^HttpClient client (or (:client client) client)
        ^HttpRequest req' (-> (HttpRequest/newBuilder)
                              (.uri (URI/create uri))
                              (.build))
        resp (-> client
                 (.sendAsync req' (HttpResponse$BodyHandlers/ofString))
                 (then #(.body ^HttpResponse %)))]

    resp))

(defn get
  [url & args]
  (request {:uri url}))

   (defn api-call-async!
    ([x]
     (api-call-async! 2000 x))
    ([sleep x]
     [x (get (str "https://httpstat.us/200?sleep=" sleep) {:async true})]))

  (time (->> (range 4096)
             (mapv api-call-async!)
             (mapv (fn [[x p]]
                     (deref p)
                     x))))

I tried with both the stock and newVirtualThreadPerTaskExecutor executors. Incrementing by powers of 2, it seems to top out at 4096, roughly the same number of system threads on my machine (macos). 3000 and 3500 seemed to work fine (and quite quickly as well, ~6 seconds for each). I also tried moving the then function to the second mapv to see if that made a difference (it did not). This must be some sort of issue with the underlying Java library. Perhaps, even when providing the virtualThreads executor, it is still attaching a thread to each request? Those Futures should each have Virtual Threads right? Very strange that I don't see any Exceptions being thrown, perhaps there's some sort of deadlock occuring. This is happening both on openjdk 11 and 20.

borkdude commented 1 year ago

If you have the same problem with the java.net.http library I'd try pasting that example on a Java forum or so, to check if this is expected or a JDK bug. Unfortunately I'm not in a position to fix JDK bugs.

borkdude commented 1 year ago

Closing since this library is built on java.net.http and issues with those should be filed with JVM vendors.