cognitect-labs / aws-api

AWS, data driven
Apache License 2.0
732 stars 100 forks source link

S3 GetObject fails with large files #209

Open x64-latacora opened 2 years ago

x64-latacora commented 2 years ago

Dependencies

{:dependencies [[org.clojure/clojure "1.10.2-alpha1"]
                 [com.cognitect.aws/api "0.8.539"]
                 [com.cognitect.aws/endpoints "1.1.11.692"]
                 [com.cognitect.aws/s3 "820.2.1083.0"]
                 [com.cognitect.aws/sns "811.2.834.0"]
                 [com.amazonaws/aws-lambda-java-core "1.2.1"]
                 [org.clojure/tools.cli "1.0.206"]
                 ...
}

Description with failing test case

Similar issue to https://github.com/cognitect-labs/aws-api/issues/97, but the call fails when making a GetObject API call.

For reference, the file being fetches is 3.6GB.

Stack traces

(:Body (awsi/invoke @s3 {:op :GetObject
                                          :request {:Bucket bucket-name
                                          :Key    object-key}}))

2022-03-30 14:14:35.028:INFO::nRepl-session-014b242b-3674-43b2-ba09-c6e46b421b54: Logging initialized @52196ms to org.eclipse.jetty.util.log.StdErrLog
2022-03-30 14:15:38.907:INFO:oejc.ResponseNotifier:qtp1253317297-38: Exception while notifying listener org.eclipse.jetty.client.HttpRequest$10@2d6a60a1
java.lang.NegativeArraySizeException: -463848656
    at clojure.lang.Numbers.byte_array(Numbers.java:1394)
    at cognitect.http_client$empty_bbuf.invokeStatic(http_client.clj:37)
    at cognitect.http_client$empty_bbuf.invoke(http_client.clj:34)
    at cognitect.http_client$on_headers.invokeStatic(http_client.clj:132)
    at cognitect.http_client$on_headers.invoke(http_client.clj:111)
    at clojure.lang.Atom.swap(Atom.java:51)
    at clojure.core$swap_BANG_.invokeStatic(core.clj:2355)
    at clojure.core$swap_BANG_.invoke(core.clj:2347)
    at cognitect.http_client.Client$fn$reify__27175.onHeaders(http_client.clj:232)
    at org.eclipse.jetty.client.HttpRequest$10.onHeaders(HttpRequest.java:528)
    at org.eclipse.jetty.client.ResponseNotifier.notifyHeaders(ResponseNotifier.java:100)
    at org.eclipse.jetty.client.ResponseNotifier.notifyHeaders(ResponseNotifier.java:92)
    at org.eclipse.jetty.client.HttpReceiver.responseHeaders(HttpReceiver.java:296)
    at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.headerComplete(HttpReceiverOverHTTP.java:310)
    at org.eclipse.jetty.http.HttpParser.parseFields(HttpParser.java:1245)
    at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:1528)
    at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.parse(HttpReceiverOverHTTP.java:204)
    at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.process(HttpReceiverOverHTTP.java:144)
    at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.receive(HttpReceiverOverHTTP.java:79)
    at org.eclipse.jetty.client.http.HttpChannelOverHTTP.receive(HttpChannelOverHTTP.java:131)
    at org.eclipse.jetty.client.http.HttpConnectionOverHTTP.onFillable(HttpConnectionOverHTTP.java:172)
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
    at org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:555)
    at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:410)
    at org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:164)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
    at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
    at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
    at java.base/java.lang.Thread.run(Thread.java:829)
fogus commented 2 years ago

Hello! Thank you for the report. We are actively investigating and will update ASAP.

fogus commented 2 years ago

Hello again. Quick question. The http client gets the size from the content-length header. Can you let us know what that value is?

x64-latacora commented 2 years ago

@fogus I'm unable to get the value because of the exception. Here's what I get when trying to get a file of 3831118640 bytes:

(awsi/invoke @s3 {:op :GetObject
                  :request {:Bucket "..."
                            :Key     "..."}})

2022-04-05 12:14:20.325:INFO:oejc.ResponseNotifier:qtp1881928864-52: Exception while notifying listener org.eclipse.jetty.client.HttpRequest$10@4df75b0d
java.lang.NegativeArraySizeException: -463087440
    at clojure.lang.Numbers.byte_array(Numbers.java:1394)
    at cognitect.http_client$empty_bbuf.invokeStatic(http_client.clj:37)
    at cognitect.http_client$empty_bbuf.invoke(http_client.clj:34)
    at cognitect.http_client$on_headers.invokeStatic(http_client.clj:132)
    at cognitect.http_client$on_headers.invoke(http_client.clj:111)
    at clojure.lang.Atom.swap(Atom.java:51)
    at clojure.core$swap_BANG_.invokeStatic(core.clj:2355)
    at clojure.core$swap_BANG_.invoke(core.clj:2347)
    at cognitect.http_client.Client$fn$reify__27187.onHeaders(http_client.clj:232)
    at org.eclipse.jetty.client.HttpRequest$10.onHeaders(HttpRequest.java:528)
    at org.eclipse.jetty.client.ResponseNotifier.notifyHeaders(ResponseNotifier.java:100)
    at org.eclipse.jetty.client.ResponseNotifier.notifyHeaders(ResponseNotifier.java:92)
    at org.eclipse.jetty.client.HttpReceiver.responseHeaders(HttpReceiver.java:296)
    at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.headerComplete(HttpReceiverOverHTTP.java:310)
    at org.eclipse.jetty.http.HttpParser.parseFields(HttpParser.java:1245)
    at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:1528)
    at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.parse(HttpReceiverOverHTTP.java:204)
    at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.process(HttpReceiverOverHTTP.java:144)
    at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.receive(HttpReceiverOverHTTP.java:79)
    at org.eclipse.jetty.client.http.HttpChannelOverHTTP.receive(HttpChannelOverHTTP.java:131)
    at org.eclipse.jetty.client.http.HttpConnectionOverHTTP.onFillable(HttpConnectionOverHTTP.java:172)
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
    at org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:555)
    at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:410)
    at org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:164)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
    at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
    at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
    at java.base/java.lang.Thread.run(Thread.java:829)

=>
{:cognitect.anomalies/category :cognitect.anomalies/fault,
 :cognitect.anomalies/message "Value out of range for int: 2354315264",
 :cognitect.http-client/throwable #error{:cause "Value out of range for int: 2354315264",
                                         :via [{:type java.lang.IllegalArgumentException,
                                                :message "Value out of range for int: 2354315264",
                                                :at [clojure.lang.RT intCast "RT.java" 1248]}],
                                         :trace [[clojure.lang.RT intCast "RT.java" 1248]
                                                 [cognitect.http_client$expand_buffer invokeStatic "http_client.clj" 57]
                                                 [cognitect.http_client$expand_buffer invokePrim "http_client.clj" -1]
                                                 [cognitect.http_client$append_buffer invokeStatic "http_client.clj" 65]
                                                 [cognitect.http_client$append_buffer invoke "http_client.clj" 61]
                                                 [cognitect.http_client$on_content$fn__27143
                                                  invoke
                                                  "http_client.clj"
                                                  139]
                                                 [clojure.core$update invokeStatic "core.clj" 6198]
                                                 [clojure.core$update invoke "core.clj" 6190]
                                                 [cognitect.http_client$on_content invokeStatic "http_client.clj" 139]
                                                 [cognitect.http_client$on_content invoke "http_client.clj" 136]
                                                 [clojure.lang.Atom swap "Atom.java" 51]
                                                 [clojure.core$swap_BANG_ invokeStatic "core.clj" 2355]
                                                 [clojure.core$swap_BANG_ invoke "core.clj" 2347]
                                                 [cognitect.http_client.Client$fn$reify__27189
                                                  onContent
                                                  "http_client.clj"
                                                  236]
                                                 [org.eclipse.jetty.client.HttpRequest$11
                                                  onContent
                                                  "HttpRequest.java"
                                                  542]
                                                 [org.eclipse.jetty.client.api.Response$ContentListener
                                                  onContent
                                                  "Response.java"
                                                  158]
                                                 [org.eclipse.jetty.client.api.Response$AsyncContentListener
                                                  onContent
                                                  "Response.java"
                                                  189]
                                                 [org.eclipse.jetty.client.ResponseNotifier
                                                  notifyContent
                                                  "ResponseNotifier.java"
                                                  155]
                                                 [org.eclipse.jetty.client.ResponseNotifier
                                                  notifyContent
                                                  "ResponseNotifier.java"
                                                  139]
                                                 [org.eclipse.jetty.client.HttpReceiver$ContentListeners
                                                  notifyContent
                                                  "HttpReceiver.java"
                                                  693]
                                                 [org.eclipse.jetty.client.HttpReceiver$ContentListeners
                                                  access$500
                                                  "HttpReceiver.java"
                                                  655]
                                                 [org.eclipse.jetty.client.HttpReceiver
                                                  plainResponseContent
                                                  "HttpReceiver.java"
                                                  369]
                                                 [org.eclipse.jetty.client.HttpReceiver
                                                  responseContent
                                                  "HttpReceiver.java"
                                                  352]
                                                 [org.eclipse.jetty.client.http.HttpReceiverOverHTTP
                                                  content
                                                  "HttpReceiverOverHTTP.java"
                                                  323]
                                                 [org.eclipse.jetty.http.HttpParser parseContent "HttpParser.java" 1716]
                                                 [org.eclipse.jetty.http.HttpParser parseNext "HttpParser.java" 1551]
                                                 [org.eclipse.jetty.client.http.HttpReceiverOverHTTP
                                                  parse
                                                  "HttpReceiverOverHTTP.java"
                                                  204]
                                                 [org.eclipse.jetty.client.http.HttpReceiverOverHTTP
                                                  process
                                                  "HttpReceiverOverHTTP.java"
                                                  144]
                                                 [org.eclipse.jetty.client.http.HttpReceiverOverHTTP
                                                  receive
                                                  "HttpReceiverOverHTTP.java"
                                                  79]
                                                 [org.eclipse.jetty.client.http.HttpChannelOverHTTP
                                                  receive
                                                  "HttpChannelOverHTTP.java"
                                                  131]
                                                 [org.eclipse.jetty.client.http.HttpConnectionOverHTTP
                                                  onFillable
                                                  "HttpConnectionOverHTTP.java"
                                                  172]
                                                 [org.eclipse.jetty.io.AbstractConnection$ReadCallback
                                                  succeeded
                                                  "AbstractConnection.java"
                                                  311]
                                                 [org.eclipse.jetty.io.FillInterest fillable "FillInterest.java" 105]
                                                 [org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint
                                                  onFillable
                                                  "SslConnection.java"
                                                  555]
                                                 [org.eclipse.jetty.io.ssl.SslConnection
                                                  onFillable
                                                  "SslConnection.java"
                                                  410]
                                                 [org.eclipse.jetty.io.ssl.SslConnection$2
                                                  succeeded
                                                  "SslConnection.java"
                                                  164]
                                                 [org.eclipse.jetty.io.FillInterest fillable "FillInterest.java" 105]
                                                 [org.eclipse.jetty.io.ChannelEndPoint$1 run "ChannelEndPoint.java" 104]
                                                 [org.eclipse.jetty.util.thread.strategy.EatWhatYouKill
                                                  runTask
                                                  "EatWhatYouKill.java"
                                                  338]
                                                 [org.eclipse.jetty.util.thread.strategy.EatWhatYouKill
                                                  doProduce
                                                  "EatWhatYouKill.java"
                                                  315]
                                                 [org.eclipse.jetty.util.thread.strategy.EatWhatYouKill
                                                  tryProduce
                                                  "EatWhatYouKill.java"
                                                  173]
                                                 [org.eclipse.jetty.util.thread.strategy.EatWhatYouKill
                                                  run
                                                  "EatWhatYouKill.java"
                                                  131]
                                                 [org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread
                                                  run
                                                  "ReservedThreadExecutor.java"
                                                  409]
                                                 [org.eclipse.jetty.util.thread.QueuedThreadPool
                                                  runJob
                                                  "QueuedThreadPool.java"
                                                  883]
                                                 [org.eclipse.jetty.util.thread.QueuedThreadPool$Runner
                                                  run
                                                  "QueuedThreadPool.java"
                                                  1034]
                                                 [java.lang.Thread run "Thread.java" 829]]}}

(meta *1)

=>
{:http-request {:request-method :get,
                :scheme :https,
                :server-port 443,
                :uri "...",
                :headers {...},
                :body nil,
                :server-name "s3.us-east-2.amazonaws.com"},
 :http-response {:cognitect.anomalies/category :cognitect.anomalies/fault,
                 :cognitect.anomalies/message "Value out of range for int: 2354315264",
                 :cognitect.http-client/throwable #error{:cause "Value out of range for int: 2354315264",
                                                         :via [{:type java.lang.IllegalArgumentException,
                                                                :message "Value out of range for int: 2354315264",
                                                                :at [clojure.lang.RT intCast "RT.java" 1248]}],
                                                         :trace [[clojure.lang.RT intCast "RT.java" 1248]
                                                                 [cognitect.http_client$expand_buffer
                                                                  invokeStatic
                                                                  "http_client.clj"
                                                                  57]
                                                                 [cognitect.http_client$expand_buffer
                                                                  invokePrim
                                                                  "http_client.clj"
                                                                  -1]
                                                                 [cognitect.http_client$append_buffer
                                                                  invokeStatic
                                                                  "http_client.clj"
                                                                  65]
                                                                 [cognitect.http_client$append_buffer
                                                                  invoke
                                                                  "http_client.clj"
                                                                  61]
                                                                 [cognitect.http_client$on_content$fn__27143
                                                                  invoke
                                                                  "http_client.clj"
                                                                  139]
                                                                 [clojure.core$update invokeStatic "core.clj" 6198]
                                                                 [clojure.core$update invoke "core.clj" 6190]
                                                                 [cognitect.http_client$on_content
                                                                  invokeStatic
                                                                  "http_client.clj"
                                                                  139]
                                                                 [cognitect.http_client$on_content
                                                                  invoke
                                                                  "http_client.clj"
                                                                  136]
                                                                 [clojure.lang.Atom swap "Atom.java" 51]
                                                                 [clojure.core$swap_BANG_ invokeStatic "core.clj" 2355]
                                                                 [clojure.core$swap_BANG_ invoke "core.clj" 2347]
                                                                 [cognitect.http_client.Client$fn$reify__27189
                                                                  onContent
                                                                  "http_client.clj"
                                                                  236]
                                                                 [org.eclipse.jetty.client.HttpRequest$11
                                                                  onContent
                                                                  "HttpRequest.java"
                                                                  542]
                                                                 [org.eclipse.jetty.client.api.Response$ContentListener
                                                                  onContent
                                                                  "Response.java"
                                                                  158]
                                                                 [org.eclipse.jetty.client.api.Response$AsyncContentListener
                                                                  onContent
                                                                  "Response.java"
                                                                  189]
                                                                 [org.eclipse.jetty.client.ResponseNotifier
                                                                  notifyContent
                                                                  "ResponseNotifier.java"
                                                                  155]
                                                                 [org.eclipse.jetty.client.ResponseNotifier
                                                                  notifyContent
                                                                  "ResponseNotifier.java"
                                                                  139]
                                                                 [org.eclipse.jetty.client.HttpReceiver$ContentListeners
                                                                  notifyContent
                                                                  "HttpReceiver.java"
                                                                  693]
                                                                 [org.eclipse.jetty.client.HttpReceiver$ContentListeners
                                                                  access$500
                                                                  "HttpReceiver.java"
                                                                  655]
                                                                 [org.eclipse.jetty.client.HttpReceiver
                                                                  plainResponseContent
                                                                  "HttpReceiver.java"
                                                                  369]
                                                                 [org.eclipse.jetty.client.HttpReceiver
                                                                  responseContent
                                                                  "HttpReceiver.java"
                                                                  352]
                                                                 [org.eclipse.jetty.client.http.HttpReceiverOverHTTP
                                                                  content
                                                                  "HttpReceiverOverHTTP.java"
                                                                  323]
                                                                 [org.eclipse.jetty.http.HttpParser
                                                                  parseContent
                                                                  "HttpParser.java"
                                                                  1716]
                                                                 [org.eclipse.jetty.http.HttpParser
                                                                  parseNext
                                                                  "HttpParser.java"
                                                                  1551]
                                                                 [org.eclipse.jetty.client.http.HttpReceiverOverHTTP
                                                                  parse
                                                                  "HttpReceiverOverHTTP.java"
                                                                  204]
                                                                 [org.eclipse.jetty.client.http.HttpReceiverOverHTTP
                                                                  process
                                                                  "HttpReceiverOverHTTP.java"
                                                                  144]
                                                                 [org.eclipse.jetty.client.http.HttpReceiverOverHTTP
                                                                  receive
                                                                  "HttpReceiverOverHTTP.java"
                                                                  79]
                                                                 [org.eclipse.jetty.client.http.HttpChannelOverHTTP
                                                                  receive
                                                                  "HttpChannelOverHTTP.java"
                                                                  131]
                                                                 [org.eclipse.jetty.client.http.HttpConnectionOverHTTP
                                                                  onFillable
                                                                  "HttpConnectionOverHTTP.java"
                                                                  172]
                                                                 [org.eclipse.jetty.io.AbstractConnection$ReadCallback
                                                                  succeeded
                                                                  "AbstractConnection.java"
                                                                  311]
                                                                 [org.eclipse.jetty.io.FillInterest
                                                                  fillable
                                                                  "FillInterest.java"
                                                                  105]
                                                                 [org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint
                                                                  onFillable
                                                                  "SslConnection.java"
                                                                  555]
                                                                 [org.eclipse.jetty.io.ssl.SslConnection
                                                                  onFillable
                                                                  "SslConnection.java"
                                                                  410]
                                                                 [org.eclipse.jetty.io.ssl.SslConnection$2
                                                                  succeeded
                                                                  "SslConnection.java"
                                                                  164]
                                                                 [org.eclipse.jetty.io.FillInterest
                                                                  fillable
                                                                  "FillInterest.java"
                                                                  105]
                                                                 [org.eclipse.jetty.io.ChannelEndPoint$1
                                                                  run
                                                                  "ChannelEndPoint.java"
                                                                  104]
                                                                 [org.eclipse.jetty.util.thread.strategy.EatWhatYouKill
                                                                  runTask
                                                                  "EatWhatYouKill.java"
                                                                  338]
                                                                 [org.eclipse.jetty.util.thread.strategy.EatWhatYouKill
                                                                  doProduce
                                                                  "EatWhatYouKill.java"
                                                                  315]
                                                                 [org.eclipse.jetty.util.thread.strategy.EatWhatYouKill
                                                                  tryProduce
                                                                  "EatWhatYouKill.java"
                                                                  173]
                                                                 [org.eclipse.jetty.util.thread.strategy.EatWhatYouKill
                                                                  run
                                                                  "EatWhatYouKill.java"
                                                                  131]
                                                                 [org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread
                                                                  run
                                                                  "ReservedThreadExecutor.java"
                                                                  409]
                                                                 [org.eclipse.jetty.util.thread.QueuedThreadPool
                                                                  runJob
                                                                  "QueuedThreadPool.java"
                                                                  883]
                                                                 [org.eclipse.jetty.util.thread.QueuedThreadPool$Runner
                                                                  run
                                                                  "QueuedThreadPool.java"
                                                                  1034]
                                                                 [java.lang.Thread run "Thread.java" 829]]},
                 :body nil}}

If there's another way I can get the content-length please do tell.

chance-latacora commented 2 years ago

Took a look at this and I'm pretty sure the issue is:

You can reproduce this just calling (clojure.lang.Numbers/byte_array 3831118640)

chance-latacora commented 2 years ago

Of course, java indexes arrays with ints, so I'm not sure what the actual fix here would be :)

x64-latacora commented 2 years ago

I'm not sure what the actual fix here would be

Grab large files in chunks?

x64-latacora commented 2 years ago

@fogus thanks to some clojure wizardry by @cr-latacora:

Content-Length: 3831191098
fogus commented 2 years ago

I suspect that there may be a way to address this in user-space using the iteration function and by specifying increasing :Range slices -- concatenating all of the slices at the end. If so then this would be the preferred way since it's unlikely that we'll have a client fix in hand in a useable timeframe.

dchelimsky commented 1 year ago

@x64-latacora were you able to resolve this with iteration?

x64-latacora commented 1 year ago

We did not. We switched to using amazonica to get objects instead of cognitect's s3-api.

lowecg commented 1 year ago

@dchelimsky I have this working with Range slices.

@fogus thanks for the tip on using :Range

The following was tested with an object size of 2243897556 bytes (larger than max int). Using VisualVM attached to the REPL, the memory was stable during the download remaining between 70-100MiB.

(require '[clojure.java.io :as io])
(import java.io.SequenceInputStream)

(defn parse-content-range
  "Extract the object size from the ContentRange response attribute and convert to Long type
  e.g. \"bytes 0-5242879/2243897556\"

  Returns 2243897556"
  [content-range]
  (when-let [object-size (re-find #"[0-9]+$" content-range)]
    (Long/parseLong object-size)))

(def chunk-size-bytes (* 1024 1024 5)) ;; chosen arbitrarily

(defn get-object-chunks [{:keys [bucket, key]}]
  (iteration (fn [range-byte-pos]
               (let [to-byte-pos (+ range-byte-pos chunk-size-bytes)
                     range (str "bytes=" range-byte-pos "-" (dec to-byte-pos))
                     op-map {:op :GetObject :request {:Bucket bucket :Key key :Range range}}
                     {:keys [ContentRange] :as response} (aws/invoke s3 op-map)]
                 (println :range range :response response)

                 ;; todo: check the response for errors

                 (assoc response :range-byte-pos to-byte-pos
                                 :object-size (parse-content-range ContentRange))))
             :initk 0
             :kf (fn [{:keys [range-byte-pos, object-size]}]
                   (when (< range-byte-pos object-size)
                     range-byte-pos))
             :vf :Body))

(defn seq-enumeration
  "Returns a java.util.Enumeration on a seq"
  {:static true}
  [coll]
  (clojure.lang.SeqEnumeration. coll))

(time
  (let [s3-address {:bucket "your bucket"
                    :key    "your key"}
        target-file "/path/to/some-file"]
    (with-open [target (io/output-stream (io/file target-file))]
      (io/copy (SequenceInputStream. (seq-enumeration (sequence (get-object-chunks s3-address))))
               target))))
lowecg commented 1 year ago

There is a caveat on the previous example: the ranging doesn't play well if an object's content encoding is set.

Each chunk will be treated as self-contained gzip content, which doesn't work for obvious reasons.

I'm not sure if there's a way to get into the Jetty client and disable the GZip decoder from the aws-api


:Range "bytes=5242880-10485759"}}, :response {:cognitect.anomalies/category :cognitect.anomalies/fault, :cognitect.anomalies/message "java.util.zip.ZipException: Invalid gzip bytes", :cognitect.http-client/throwable #error {
 :cause "Invalid gzip bytes"
 :via
 [{:type java.lang.RuntimeException
   :message "java.util.zip.ZipException: Invalid gzip bytes"
   :at [org.eclipse.jetty.http.GZIPContentDecoder decodeChunks "GZIPContentDecoder.java" 402]}
  {:type java.util.zip.ZipException
   :message "Invalid gzip bytes"
   :at [org.eclipse.jetty.http.GZIPContentDecoder decodeChunks "GZIPContentDecoder.java" 272]}]
 :trace
 [[org.eclipse.jetty.http.GZIPContentDecoder decodeChunks "GZIPContentDecoder.java" 272]
  [org.eclipse.jetty.http.GZIPContentDecoder decode "GZIPContentDecoder.java" 90]
  [org.eclipse.jetty.client.HttpReceiver$Decoder decodeChunk "HttpReceiver.java" 819]
  [org.eclipse.jetty.client.HttpReceiver$Decoder decode "HttpReceiver.java" 788]
  [org.eclipse.jetty.client.HttpReceiver$Decoder decode "HttpReceiver.java" 768]
  [org.eclipse.jetty.client.HttpReceiver$Decoder access$600 "HttpReceiver.java" 744]
  [org.eclipse.jetty.client.HttpReceiver decodeResponseContent "HttpReceiver.java" 386]
  [org.eclipse.jetty.client.HttpReceiver responseContent "HttpReceiver.java" 354]
  [org.eclipse.jetty.client.http.HttpReceiverOverHTTP content "HttpReceiverOverHTTP.java" 332]
  [org.eclipse.jetty.http.HttpParser parseContent "HttpParser.java" 1716]
  [org.eclipse.jetty.http.HttpParser parseNext "HttpParser.java" 1551]
  [org.eclipse.jetty.client.http.HttpReceiverOverHTTP parse "HttpReceiverOverHTTP.java" 208]
  [org.eclipse.jetty.client.http.HttpReceiverOverHTTP process "HttpReceiverOverHTTP.java" 148]
  [org.eclipse.jetty.client.http.HttpReceiverOverHTTP receive "HttpReceiverOverHTTP.java" 80]
  [org.eclipse.jetty.client.http.HttpChannelOverHTTP receive "HttpChannelOverHTTP.java" 131]
  [org.eclipse.jetty.client.http.HttpConnectionOverHTTP onFillable "HttpConnectionOverHTTP.java" 172]
  [org.eclipse.jetty.io.AbstractConnection$ReadCallback succeeded "AbstractConnection.java" 311]
  [org.eclipse.jetty.io.FillInterest fillable "FillInterest.java" 105]
  [org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint onFillable "SslConnection.java" 555]
  [org.eclipse.jetty.io.ssl.SslConnection onFillable "SslConnection.java" 410]
  [org.eclipse.jetty.io.ssl.SslConnection$2 succeeded "SslConnection.java" 164]
  [org.eclipse.jetty.io.FillInterest fillable "FillInterest.java" 105]
  [org.eclipse.jetty.io.ChannelEndPoint$1 run "ChannelEndPoint.java" 104]
  [org.eclipse.jetty.util.thread.strategy.EatWhatYouKill runTask "EatWhatYouKill.java" 338]
  [org.eclipse.jetty.util.thread.strategy.EatWhatYouKill doProduce "EatWhatYouKill.java" 315]
  [org.eclipse.jetty.util.thread.strategy.EatWhatYouKill tryProduce "EatWhatYouKill.java" 173]
  [org.eclipse.jetty.util.thread.strategy.EatWhatYouKill run "EatWhatYouKill.java" 131]
  [org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread run "ReservedThreadExecutor.java" 409]
  [org.eclipse.jetty.util.thread.QueuedThreadPool runJob "QueuedThreadPool.java" 883]
  [org.eclipse.jetty.util.thread.QueuedThreadPool$Runner run "QueuedThreadPool.java" 1034]
  [java.lang.Thread run "Thread.java" 829]]}}, :line 76}
lowecg commented 1 month ago

With respect to the above situation (reading chunks for gz compressed content), I used the following hack to disable Jetty GZip decoder for my S3 client.

CAUTION: since AWS API uses a shared http-client for all services, this will affect ALL AWS calls. It is best to create a dedicated client/http-client for input-stream-large.

(defn private-field [^Object obj ^String field-name]
  (when obj
    (when-let [^Field f (some
                          (fn [^Class c]
                            (try (.getDeclaredField c field-name)
                                 (catch NoSuchFieldException _ nil)))
                          (take-while some? (iterate (fn [^Class c] (.getSuperclass c)) (.getClass obj))))]
      (. f (setAccessible true))
      (. f (get obj)))))

(defn disable-jetty-content-decoders
  "Prevent Jetty client from attempting to decode GZip compressed content.
   When downloading gzip encoded content as range slices, each chunk will be treated as self-contained gzip content, which doesn't work for obvious reasons.

   CAUTION: since AWS API uses a shared http-client for all services, this will affect ALL AWS calls. It is best to create a dedicated client/http-client for input-stream-large."
  ([]
   (disable-jetty-content-decoders @s3))
  ([client]
   (if-let [^HttpClient jetty-client (-> (client.protocol/-get-info client)
                                         :http-client
                                         (private-field "c")
                                         (private-field "jetty_client"))]
     (do
       (log/debug :in 'disable-decoder-factories :message "Disabling Jetty content decoders for client" :client client :jetty-client jetty-client)
       (.clear (.getContentDecoderFactories jetty-client)))
     (log/error :in 'disable-decoder-factories :message "Failed to disable Jetty client decoders. Subsequent get-object operations may fail"))))