Open x64-latacora opened 2 years ago
Hello! Thank you for the report. We are actively investigating and will update ASAP.
Hello again. Quick question. The http client gets the size from the content-length
header. Can you let us know what that value is?
@fogus I'm unable to get the value because of the exception. Here's what I get when trying to get a file of 3831118640
bytes:
(awsi/invoke @s3 {:op :GetObject
:request {:Bucket "..."
:Key "..."}})
2022-04-05 12:14:20.325:INFO:oejc.ResponseNotifier:qtp1881928864-52: Exception while notifying listener org.eclipse.jetty.client.HttpRequest$10@4df75b0d
java.lang.NegativeArraySizeException: -463087440
at clojure.lang.Numbers.byte_array(Numbers.java:1394)
at cognitect.http_client$empty_bbuf.invokeStatic(http_client.clj:37)
at cognitect.http_client$empty_bbuf.invoke(http_client.clj:34)
at cognitect.http_client$on_headers.invokeStatic(http_client.clj:132)
at cognitect.http_client$on_headers.invoke(http_client.clj:111)
at clojure.lang.Atom.swap(Atom.java:51)
at clojure.core$swap_BANG_.invokeStatic(core.clj:2355)
at clojure.core$swap_BANG_.invoke(core.clj:2347)
at cognitect.http_client.Client$fn$reify__27187.onHeaders(http_client.clj:232)
at org.eclipse.jetty.client.HttpRequest$10.onHeaders(HttpRequest.java:528)
at org.eclipse.jetty.client.ResponseNotifier.notifyHeaders(ResponseNotifier.java:100)
at org.eclipse.jetty.client.ResponseNotifier.notifyHeaders(ResponseNotifier.java:92)
at org.eclipse.jetty.client.HttpReceiver.responseHeaders(HttpReceiver.java:296)
at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.headerComplete(HttpReceiverOverHTTP.java:310)
at org.eclipse.jetty.http.HttpParser.parseFields(HttpParser.java:1245)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:1528)
at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.parse(HttpReceiverOverHTTP.java:204)
at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.process(HttpReceiverOverHTTP.java:144)
at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.receive(HttpReceiverOverHTTP.java:79)
at org.eclipse.jetty.client.http.HttpChannelOverHTTP.receive(HttpChannelOverHTTP.java:131)
at org.eclipse.jetty.client.http.HttpConnectionOverHTTP.onFillable(HttpConnectionOverHTTP.java:172)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
at org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:555)
at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:410)
at org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:164)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
at java.base/java.lang.Thread.run(Thread.java:829)
=>
{:cognitect.anomalies/category :cognitect.anomalies/fault,
:cognitect.anomalies/message "Value out of range for int: 2354315264",
:cognitect.http-client/throwable #error{:cause "Value out of range for int: 2354315264",
:via [{:type java.lang.IllegalArgumentException,
:message "Value out of range for int: 2354315264",
:at [clojure.lang.RT intCast "RT.java" 1248]}],
:trace [[clojure.lang.RT intCast "RT.java" 1248]
[cognitect.http_client$expand_buffer invokeStatic "http_client.clj" 57]
[cognitect.http_client$expand_buffer invokePrim "http_client.clj" -1]
[cognitect.http_client$append_buffer invokeStatic "http_client.clj" 65]
[cognitect.http_client$append_buffer invoke "http_client.clj" 61]
[cognitect.http_client$on_content$fn__27143
invoke
"http_client.clj"
139]
[clojure.core$update invokeStatic "core.clj" 6198]
[clojure.core$update invoke "core.clj" 6190]
[cognitect.http_client$on_content invokeStatic "http_client.clj" 139]
[cognitect.http_client$on_content invoke "http_client.clj" 136]
[clojure.lang.Atom swap "Atom.java" 51]
[clojure.core$swap_BANG_ invokeStatic "core.clj" 2355]
[clojure.core$swap_BANG_ invoke "core.clj" 2347]
[cognitect.http_client.Client$fn$reify__27189
onContent
"http_client.clj"
236]
[org.eclipse.jetty.client.HttpRequest$11
onContent
"HttpRequest.java"
542]
[org.eclipse.jetty.client.api.Response$ContentListener
onContent
"Response.java"
158]
[org.eclipse.jetty.client.api.Response$AsyncContentListener
onContent
"Response.java"
189]
[org.eclipse.jetty.client.ResponseNotifier
notifyContent
"ResponseNotifier.java"
155]
[org.eclipse.jetty.client.ResponseNotifier
notifyContent
"ResponseNotifier.java"
139]
[org.eclipse.jetty.client.HttpReceiver$ContentListeners
notifyContent
"HttpReceiver.java"
693]
[org.eclipse.jetty.client.HttpReceiver$ContentListeners
access$500
"HttpReceiver.java"
655]
[org.eclipse.jetty.client.HttpReceiver
plainResponseContent
"HttpReceiver.java"
369]
[org.eclipse.jetty.client.HttpReceiver
responseContent
"HttpReceiver.java"
352]
[org.eclipse.jetty.client.http.HttpReceiverOverHTTP
content
"HttpReceiverOverHTTP.java"
323]
[org.eclipse.jetty.http.HttpParser parseContent "HttpParser.java" 1716]
[org.eclipse.jetty.http.HttpParser parseNext "HttpParser.java" 1551]
[org.eclipse.jetty.client.http.HttpReceiverOverHTTP
parse
"HttpReceiverOverHTTP.java"
204]
[org.eclipse.jetty.client.http.HttpReceiverOverHTTP
process
"HttpReceiverOverHTTP.java"
144]
[org.eclipse.jetty.client.http.HttpReceiverOverHTTP
receive
"HttpReceiverOverHTTP.java"
79]
[org.eclipse.jetty.client.http.HttpChannelOverHTTP
receive
"HttpChannelOverHTTP.java"
131]
[org.eclipse.jetty.client.http.HttpConnectionOverHTTP
onFillable
"HttpConnectionOverHTTP.java"
172]
[org.eclipse.jetty.io.AbstractConnection$ReadCallback
succeeded
"AbstractConnection.java"
311]
[org.eclipse.jetty.io.FillInterest fillable "FillInterest.java" 105]
[org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint
onFillable
"SslConnection.java"
555]
[org.eclipse.jetty.io.ssl.SslConnection
onFillable
"SslConnection.java"
410]
[org.eclipse.jetty.io.ssl.SslConnection$2
succeeded
"SslConnection.java"
164]
[org.eclipse.jetty.io.FillInterest fillable "FillInterest.java" 105]
[org.eclipse.jetty.io.ChannelEndPoint$1 run "ChannelEndPoint.java" 104]
[org.eclipse.jetty.util.thread.strategy.EatWhatYouKill
runTask
"EatWhatYouKill.java"
338]
[org.eclipse.jetty.util.thread.strategy.EatWhatYouKill
doProduce
"EatWhatYouKill.java"
315]
[org.eclipse.jetty.util.thread.strategy.EatWhatYouKill
tryProduce
"EatWhatYouKill.java"
173]
[org.eclipse.jetty.util.thread.strategy.EatWhatYouKill
run
"EatWhatYouKill.java"
131]
[org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread
run
"ReservedThreadExecutor.java"
409]
[org.eclipse.jetty.util.thread.QueuedThreadPool
runJob
"QueuedThreadPool.java"
883]
[org.eclipse.jetty.util.thread.QueuedThreadPool$Runner
run
"QueuedThreadPool.java"
1034]
[java.lang.Thread run "Thread.java" 829]]}}
(meta *1)
=>
{:http-request {:request-method :get,
:scheme :https,
:server-port 443,
:uri "...",
:headers {...},
:body nil,
:server-name "s3.us-east-2.amazonaws.com"},
:http-response {:cognitect.anomalies/category :cognitect.anomalies/fault,
:cognitect.anomalies/message "Value out of range for int: 2354315264",
:cognitect.http-client/throwable #error{:cause "Value out of range for int: 2354315264",
:via [{:type java.lang.IllegalArgumentException,
:message "Value out of range for int: 2354315264",
:at [clojure.lang.RT intCast "RT.java" 1248]}],
:trace [[clojure.lang.RT intCast "RT.java" 1248]
[cognitect.http_client$expand_buffer
invokeStatic
"http_client.clj"
57]
[cognitect.http_client$expand_buffer
invokePrim
"http_client.clj"
-1]
[cognitect.http_client$append_buffer
invokeStatic
"http_client.clj"
65]
[cognitect.http_client$append_buffer
invoke
"http_client.clj"
61]
[cognitect.http_client$on_content$fn__27143
invoke
"http_client.clj"
139]
[clojure.core$update invokeStatic "core.clj" 6198]
[clojure.core$update invoke "core.clj" 6190]
[cognitect.http_client$on_content
invokeStatic
"http_client.clj"
139]
[cognitect.http_client$on_content
invoke
"http_client.clj"
136]
[clojure.lang.Atom swap "Atom.java" 51]
[clojure.core$swap_BANG_ invokeStatic "core.clj" 2355]
[clojure.core$swap_BANG_ invoke "core.clj" 2347]
[cognitect.http_client.Client$fn$reify__27189
onContent
"http_client.clj"
236]
[org.eclipse.jetty.client.HttpRequest$11
onContent
"HttpRequest.java"
542]
[org.eclipse.jetty.client.api.Response$ContentListener
onContent
"Response.java"
158]
[org.eclipse.jetty.client.api.Response$AsyncContentListener
onContent
"Response.java"
189]
[org.eclipse.jetty.client.ResponseNotifier
notifyContent
"ResponseNotifier.java"
155]
[org.eclipse.jetty.client.ResponseNotifier
notifyContent
"ResponseNotifier.java"
139]
[org.eclipse.jetty.client.HttpReceiver$ContentListeners
notifyContent
"HttpReceiver.java"
693]
[org.eclipse.jetty.client.HttpReceiver$ContentListeners
access$500
"HttpReceiver.java"
655]
[org.eclipse.jetty.client.HttpReceiver
plainResponseContent
"HttpReceiver.java"
369]
[org.eclipse.jetty.client.HttpReceiver
responseContent
"HttpReceiver.java"
352]
[org.eclipse.jetty.client.http.HttpReceiverOverHTTP
content
"HttpReceiverOverHTTP.java"
323]
[org.eclipse.jetty.http.HttpParser
parseContent
"HttpParser.java"
1716]
[org.eclipse.jetty.http.HttpParser
parseNext
"HttpParser.java"
1551]
[org.eclipse.jetty.client.http.HttpReceiverOverHTTP
parse
"HttpReceiverOverHTTP.java"
204]
[org.eclipse.jetty.client.http.HttpReceiverOverHTTP
process
"HttpReceiverOverHTTP.java"
144]
[org.eclipse.jetty.client.http.HttpReceiverOverHTTP
receive
"HttpReceiverOverHTTP.java"
79]
[org.eclipse.jetty.client.http.HttpChannelOverHTTP
receive
"HttpChannelOverHTTP.java"
131]
[org.eclipse.jetty.client.http.HttpConnectionOverHTTP
onFillable
"HttpConnectionOverHTTP.java"
172]
[org.eclipse.jetty.io.AbstractConnection$ReadCallback
succeeded
"AbstractConnection.java"
311]
[org.eclipse.jetty.io.FillInterest
fillable
"FillInterest.java"
105]
[org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint
onFillable
"SslConnection.java"
555]
[org.eclipse.jetty.io.ssl.SslConnection
onFillable
"SslConnection.java"
410]
[org.eclipse.jetty.io.ssl.SslConnection$2
succeeded
"SslConnection.java"
164]
[org.eclipse.jetty.io.FillInterest
fillable
"FillInterest.java"
105]
[org.eclipse.jetty.io.ChannelEndPoint$1
run
"ChannelEndPoint.java"
104]
[org.eclipse.jetty.util.thread.strategy.EatWhatYouKill
runTask
"EatWhatYouKill.java"
338]
[org.eclipse.jetty.util.thread.strategy.EatWhatYouKill
doProduce
"EatWhatYouKill.java"
315]
[org.eclipse.jetty.util.thread.strategy.EatWhatYouKill
tryProduce
"EatWhatYouKill.java"
173]
[org.eclipse.jetty.util.thread.strategy.EatWhatYouKill
run
"EatWhatYouKill.java"
131]
[org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread
run
"ReservedThreadExecutor.java"
409]
[org.eclipse.jetty.util.thread.QueuedThreadPool
runJob
"QueuedThreadPool.java"
883]
[org.eclipse.jetty.util.thread.QueuedThreadPool$Runner
run
"QueuedThreadPool.java"
1034]
[java.lang.Thread run "Thread.java" 829]]},
:body nil}}
If there's another way I can get the content-length
please do tell.
Took a look at this and I'm pretty sure the issue is:
cognitect.http-client/on-headers
parses content length as a longcognitect.http-client/empty-bbuf
is called with that long to create a body bufferempty-bbuf
calls clojure.core/byte-array
, which calls clojure.lang.Numbers/byte_array
, which You can reproduce this just calling (clojure.lang.Numbers/byte_array 3831118640)
Of course, java indexes arrays with ints, so I'm not sure what the actual fix here would be :)
I'm not sure what the actual fix here would be
Grab large files in chunks?
@fogus thanks to some clojure wizardry by @cr-latacora:
Content-Length: 3831191098
I suspect that there may be a way to address this in user-space using the iteration
function and by specifying increasing :Range
slices -- concatenating all of the slices at the end. If so then this would be the preferred way since it's unlikely that we'll have a client fix in hand in a useable timeframe.
@x64-latacora were you able to resolve this with iteration
?
We did not. We switched to using amazonica to get objects instead of cognitect's s3-api.
@dchelimsky I have this working with Range slices.
@fogus thanks for the tip on using :Range
The following was tested with an object size of 2243897556 bytes (larger than max int). Using VisualVM attached to the REPL, the memory was stable during the download remaining between 70-100MiB.
(require '[clojure.java.io :as io])
(import java.io.SequenceInputStream)
(defn parse-content-range
"Extract the object size from the ContentRange response attribute and convert to Long type
e.g. \"bytes 0-5242879/2243897556\"
Returns 2243897556"
[content-range]
(when-let [object-size (re-find #"[0-9]+$" content-range)]
(Long/parseLong object-size)))
(def chunk-size-bytes (* 1024 1024 5)) ;; chosen arbitrarily
(defn get-object-chunks [{:keys [bucket, key]}]
(iteration (fn [range-byte-pos]
(let [to-byte-pos (+ range-byte-pos chunk-size-bytes)
range (str "bytes=" range-byte-pos "-" (dec to-byte-pos))
op-map {:op :GetObject :request {:Bucket bucket :Key key :Range range}}
{:keys [ContentRange] :as response} (aws/invoke s3 op-map)]
(println :range range :response response)
;; todo: check the response for errors
(assoc response :range-byte-pos to-byte-pos
:object-size (parse-content-range ContentRange))))
:initk 0
:kf (fn [{:keys [range-byte-pos, object-size]}]
(when (< range-byte-pos object-size)
range-byte-pos))
:vf :Body))
(defn seq-enumeration
"Returns a java.util.Enumeration on a seq"
{:static true}
[coll]
(clojure.lang.SeqEnumeration. coll))
(time
(let [s3-address {:bucket "your bucket"
:key "your key"}
target-file "/path/to/some-file"]
(with-open [target (io/output-stream (io/file target-file))]
(io/copy (SequenceInputStream. (seq-enumeration (sequence (get-object-chunks s3-address))))
target))))
There is a caveat on the previous example: the ranging doesn't play well if an object's content encoding is set.
Each chunk will be treated as self-contained gzip content, which doesn't work for obvious reasons.
I'm not sure if there's a way to get into the Jetty client and disable the GZip decoder from the aws-api
:Range "bytes=5242880-10485759"}}, :response {:cognitect.anomalies/category :cognitect.anomalies/fault, :cognitect.anomalies/message "java.util.zip.ZipException: Invalid gzip bytes", :cognitect.http-client/throwable #error {
:cause "Invalid gzip bytes"
:via
[{:type java.lang.RuntimeException
:message "java.util.zip.ZipException: Invalid gzip bytes"
:at [org.eclipse.jetty.http.GZIPContentDecoder decodeChunks "GZIPContentDecoder.java" 402]}
{:type java.util.zip.ZipException
:message "Invalid gzip bytes"
:at [org.eclipse.jetty.http.GZIPContentDecoder decodeChunks "GZIPContentDecoder.java" 272]}]
:trace
[[org.eclipse.jetty.http.GZIPContentDecoder decodeChunks "GZIPContentDecoder.java" 272]
[org.eclipse.jetty.http.GZIPContentDecoder decode "GZIPContentDecoder.java" 90]
[org.eclipse.jetty.client.HttpReceiver$Decoder decodeChunk "HttpReceiver.java" 819]
[org.eclipse.jetty.client.HttpReceiver$Decoder decode "HttpReceiver.java" 788]
[org.eclipse.jetty.client.HttpReceiver$Decoder decode "HttpReceiver.java" 768]
[org.eclipse.jetty.client.HttpReceiver$Decoder access$600 "HttpReceiver.java" 744]
[org.eclipse.jetty.client.HttpReceiver decodeResponseContent "HttpReceiver.java" 386]
[org.eclipse.jetty.client.HttpReceiver responseContent "HttpReceiver.java" 354]
[org.eclipse.jetty.client.http.HttpReceiverOverHTTP content "HttpReceiverOverHTTP.java" 332]
[org.eclipse.jetty.http.HttpParser parseContent "HttpParser.java" 1716]
[org.eclipse.jetty.http.HttpParser parseNext "HttpParser.java" 1551]
[org.eclipse.jetty.client.http.HttpReceiverOverHTTP parse "HttpReceiverOverHTTP.java" 208]
[org.eclipse.jetty.client.http.HttpReceiverOverHTTP process "HttpReceiverOverHTTP.java" 148]
[org.eclipse.jetty.client.http.HttpReceiverOverHTTP receive "HttpReceiverOverHTTP.java" 80]
[org.eclipse.jetty.client.http.HttpChannelOverHTTP receive "HttpChannelOverHTTP.java" 131]
[org.eclipse.jetty.client.http.HttpConnectionOverHTTP onFillable "HttpConnectionOverHTTP.java" 172]
[org.eclipse.jetty.io.AbstractConnection$ReadCallback succeeded "AbstractConnection.java" 311]
[org.eclipse.jetty.io.FillInterest fillable "FillInterest.java" 105]
[org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint onFillable "SslConnection.java" 555]
[org.eclipse.jetty.io.ssl.SslConnection onFillable "SslConnection.java" 410]
[org.eclipse.jetty.io.ssl.SslConnection$2 succeeded "SslConnection.java" 164]
[org.eclipse.jetty.io.FillInterest fillable "FillInterest.java" 105]
[org.eclipse.jetty.io.ChannelEndPoint$1 run "ChannelEndPoint.java" 104]
[org.eclipse.jetty.util.thread.strategy.EatWhatYouKill runTask "EatWhatYouKill.java" 338]
[org.eclipse.jetty.util.thread.strategy.EatWhatYouKill doProduce "EatWhatYouKill.java" 315]
[org.eclipse.jetty.util.thread.strategy.EatWhatYouKill tryProduce "EatWhatYouKill.java" 173]
[org.eclipse.jetty.util.thread.strategy.EatWhatYouKill run "EatWhatYouKill.java" 131]
[org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread run "ReservedThreadExecutor.java" 409]
[org.eclipse.jetty.util.thread.QueuedThreadPool runJob "QueuedThreadPool.java" 883]
[org.eclipse.jetty.util.thread.QueuedThreadPool$Runner run "QueuedThreadPool.java" 1034]
[java.lang.Thread run "Thread.java" 829]]}}, :line 76}
With respect to the above situation (reading chunks for gz compressed content), I used the following hack to disable Jetty GZip decoder for my S3 client.
CAUTION: since AWS API uses a shared http-client for all services, this will affect ALL AWS calls. It is best to create a dedicated client/http-client for input-stream-large.
(defn private-field [^Object obj ^String field-name]
(when obj
(when-let [^Field f (some
(fn [^Class c]
(try (.getDeclaredField c field-name)
(catch NoSuchFieldException _ nil)))
(take-while some? (iterate (fn [^Class c] (.getSuperclass c)) (.getClass obj))))]
(. f (setAccessible true))
(. f (get obj)))))
(defn disable-jetty-content-decoders
"Prevent Jetty client from attempting to decode GZip compressed content.
When downloading gzip encoded content as range slices, each chunk will be treated as self-contained gzip content, which doesn't work for obvious reasons.
CAUTION: since AWS API uses a shared http-client for all services, this will affect ALL AWS calls. It is best to create a dedicated client/http-client for input-stream-large."
([]
(disable-jetty-content-decoders @s3))
([client]
(if-let [^HttpClient jetty-client (-> (client.protocol/-get-info client)
:http-client
(private-field "c")
(private-field "jetty_client"))]
(do
(log/debug :in 'disable-decoder-factories :message "Disabling Jetty content decoders for client" :client client :jetty-client jetty-client)
(.clear (.getContentDecoderFactories jetty-client)))
(log/error :in 'disable-decoder-factories :message "Failed to disable Jetty client decoders. Subsequent get-object operations may fail"))))
Dependencies
Description with failing test case
Similar issue to https://github.com/cognitect-labs/aws-api/issues/97, but the call fails when making a
GetObject
API call.For reference, the file being fetches is 3.6GB.
Stack traces