cognitect-labs / aws-api

AWS, data driven
Apache License 2.0
727 stars 100 forks source link

SHA256 signature verification fails for S3 multipart uploads #229

Closed kirked closed 1 year ago

kirked commented 1 year ago

Thanks for sharing Cognitect's aws-api!

Dependencies

{:deps
 {com.cognitect.aws/api       {:mvn/version "0.8.635"}
  com.cognitect.aws/endpoints {:mvn/version "1.1.12.373"}
  com.cognitect.aws/s3        {:mvn/version "825.2.1250.0"}}

Description with failing test case

S3 Multipart Upload, using a fixed byte-array per-thread, fails signature verification because of errant implementation in the cognitect.aws.util/sha-256.

The current sha-256 implementation uses (.array ^ByteBuffer bb) which is the entire backing array, without regard to the ByteBuffer's position or remaining values. Because of this, the S3 :UploadPart API fails signature verification on the last part of a multipart upload that doesn't completely fill the backing array.

(require
  '[clojure.java.io :as io]
  '[cognitect.aws.client.api :as aws])
(import [java.nio ByteBuffer])

;; we use a 5MB backing array per-thread (the AWS minimum size)
(def ary (byte-array (* 5 1024 1024)))

;; populate some random data into the array…
(with-open [in (io/input-stream "/dev/urandom")]
  (.read in ary))

(def base-req {:Bucket "mybucket" :Key "test-document.bin"})

(def s3 (aws/client {:api :s3}))
(def upload-req
  (assoc base-req :UploadId
    (:UploadId (aws/invoke s3 {:op :CreateMultipartUpload :request base-req}))))

;; upload 1st block of file
(aws/invoke s3 {:op :UploadPart :request (assoc upload-req :Body (ByteBuffer/wrap ary 0 (alength ary)))})

;; …other blocks of huge file fill the backing array…

;; now, last block of file is only a portion of the backing array, let's send it:
(aws/invoke s3 {:op :UploadPart :request (assoc upload-req :Body (ByteBuffer/wrap ary 0 (* 1024 1024)))})

;; => fails SHA256 signature verification on server side

‼️ it's important to run the following after the test above! ‼️

;; Ensure to do this to free your uploaded parts and have AWS stop charging you for storing it!!
(aws/invoke s3 {:op :AbortMultipartUpload :request upload-req})

The SHA256 signature fails because in the final :UploadPart request, we're using 1MB of the 5MB backing array. The Cognitect client uses the entire backing array to compute the hash, whereas the correct way is to only hash the first 1MB of it.

Sample fix

Here is a sample fix to the cognitect.aws.util/sha-256 function to fix the problem:

(defn sha-256
  "Returns the sha-256 digest (bytes) of data, which can be a
  byte-array, an input-stream, or nil, in which case returns the
  sha-256 of the empty string."
  ([data]
    (cond
      (string? data)
      (let [ba (.getBytes ^String data "UTF-8")]
        (sha-256 ba 0 (alength ba)))
      (instance? ByteBuffer data)
      (let [^ByteBuffer bb data]
        (sha-256 (.array bb) (.position bb) (.remaining bb)))
      :else
      (let [^bytes ba data]
        (sha-256 ba 0 (when ba (alength ba))))))
  ([^bytes ba off len]
    (let [digest (MessageDigest/getInstance "SHA-256")]
      (when ba
        (.update digest ba off len))
      (.digest digest))))
kirked commented 1 year ago

I realise you don't take PR's, but not until after I made one: https://github.com/kirked/aws-api/pull/1

That also fixes md5, and adds tests. It's my pleasure to donate those changes to the public domain, or any licensing scheme acceptable to Cognitect, free of charge, and without any claim of IP rights.

scottbale commented 1 year ago

Thanks for opening this issue and the accompanying PR, @kirked. Looks to me at first glance like a good change.

In the meantime, as a workaround, for the final block upload can you create a correctly-sized byte array copy of the original byte array? Something like:

;; now, last block of file is only a portion of the backing array, let's send it:
(let [final-size (* 1024 1024)
      ary-copy (java.util.Arrays/copyOf ary final-size)
  (aws/invoke s3 {:op :UploadPart :request (assoc upload-req :Body (ByteBuffer/wrap ary-copy 0 final-size))}))
dchelimsky commented 1 year ago

Thanks @kirked, for a detailed description, clear repro, clear solution, and your willingness to share your solution despite our no-PR policy! We really appreciate the time and effort.

The fact that aws-api uses ByteBuffer internally is an implementation detail, but aws-api only supports byte[] and InputStream for AWS blob types (the :Body in this example).

This is documented here (linked from the README), and in the clojure spec for the request (more on this below). So ByteBuffer happens to work today (when its the same size as its backing array), but may not in the future.

You can validate request payloads by calling (aws/validate-requests s3) before invoke. Given your example, the first thing I saw was

> (aws/invoke s3 {:op :UploadPart :request (assoc upload-req :Body (ByteBuffer/wrap ary 0 (alength ary)))})
{:clojure.spec.alpha/problems
 ({:path [],
   :pred (clojure.core/fn [%] (clojure.core/contains? % :PartNumber)),
,,,

Checking the doc for :UploadPart ...

> (aws/doc s3 :UploadPart)
-------------------------
UploadPart

...

http://docs.amazonwebservices.com/AmazonS3/latest/API/mpUploadUploadPart.html

-------------------------
Request

{...
 :UploadId string,
 :Bucket string,
 ...
 :Key string,
...
 :PartNumber integer,
 :Body blob,
 ...}

Required

[:Bucket :Key :PartNumber :UploadId]

... you can see that :Body is the blob type I mentioned above, and :PartNumber is required. Once I added the :PartNumber, I invoked :UploadPart with a ByteBuffer for :Body and got:

> (aws/invoke s3 {:op :UploadPart
                  :request (assoc upload-req
                                  :Body (ByteBuffer/wrap ary 0 (alength ary))
                                  :PartNumber 1)})
{:clojure.spec.alpha/problems
 ({:path [:Body :byte-array],
   :pred clojure.core/bytes?,
   :val #object[java.nio.HeapByteBuffer 0x25316b67 "java.nio.HeapByteBuffer[pos=0 lim=1048576 cap=1048576]"],
   :via [:cognitect.aws.s3/UploadPartRequest :cognitect.aws.s3/Body],
   :in [:Body]}
  {:path [:Body :input-stream],
   :pred (clojure.core/fn [%] (clojure.core/instance? java.io.InputStream %)),
   :val #object[java.nio.HeapByteBuffer 0x25316b67 "java.nio.HeapByteBuffer[pos=0 lim=1048576 cap=1048576]"],
   :via [:cognitect.aws.s3/UploadPartRequest :cognitect.aws.s3/Body],
   :in [:Body]}),
...

which tells us that the :Body has to be a byte-array or input-stream, either of which will work just fine.

I'm going to close this, but feel free to add comments if you have any. And thanks again for the clear repro.

kirked commented 1 year ago

Yeah, I accidentally omitted PartNumber while typing the repro code into the GitHub UI. It was actually in my testing request.

Thanks for the doc link. It's actually interesting that ByteBuffer isn't supported!