aws / aws-sdk-js-v3

Modularized AWS SDK for JavaScript.
Apache License 2.0
3.08k stars 576 forks source link

[BUG]: Use Upload(@aws-sdk/lib-storage) to carry md5 encountered an error: `The XML you provided was not well-formed or did not validate against our published schema` #4321

Open skypesky opened 1 year ago

skypesky commented 1 year ago

Checkboxes for prior research

Describe the bug

When I uploaded the file and asked S3 to check the md5 of the file for me, I got an error: MalformedXML: The XML you provided was not well-formed or did not validate against our published schema

SDK version number

@aws-sdk/lib-storage@3.241.0, @aws-sdk/client-s3@3.241.0,

Which JavaScript Runtime is this issue in?

Node.js

Details of the browser/Node.js/ReactNative version

node -v: v16.17.1

Reproduction Steps

      const upload = new Upload({
        client,
        params: {
          Bucket: 'test',
          Key: 'demo.pdf',
         // note: data size > 30 MB
          Body: data,
         // data md5 value
          ContentMD5: 'wSunmxovn3F4x1+NV+/d1A==',
          Metadata: {
            'x-hash': options.hash,
          },
        },
      });

      await upload.done();

Observed Behavior

Upload failed and error found:

2023-01-03T02:14:46: MalformedXML: The XML you provided was not well-formed or did not validate against our published schema
2023-01-03T02:14:46:     at throwDefaultError (/Users/skypesky/workSpaces/javascript/arcblock/did-storage/node_modules/@aws-sdk/smithy-client/dist-cjs/default-error-handler.js:8:22)
2023-01-03T02:14:46:     at deserializeAws_restXmlCompleteMultipartUploadCommandError (/Users/skypesky/workSpaces/javascript/arcblock/did-storage/packages/s3-driver/node_modules/@aws-sdk/client-s3/dist-cjs/protocols/Aws_restXml.js:3086:43)
2023-01-03T02:14:46:     at processTicksAndRejections (node:internal/process/task_queues:96:5)
2023-01-03T02:14:46:     at async /Users/skypesky/workSpaces/javascript/arcblock/did-storage/node_modules/@aws-sdk/middleware-serde/dist-cjs/deserializerMiddleware.js:7:24
2023-01-03T02:14:46:     at async /Users/skypesky/workSpaces/javascript/arcblock/did-storage/packages/s3-driver/node_modules/@aws-sdk/middleware-signing/dist-cjs/middleware.js:14:20
2023-01-03T02:14:46:     at async /Users/skypesky/workSpaces/javascript/arcblock/did-storage/node_modules/@aws-sdk/middleware-retry/dist-cjs/retryMiddleware.js:27:46
2023-01-03T02:14:46:     at async /Users/skypesky/workSpaces/javascript/arcblock/did-storage/node_modules/@aws-sdk/middleware-logger/dist-cjs/loggerMiddleware.js:5:22
2023-01-03T02:14:46:     at async Upload.__doMultipartUpload (/Users/skypesky/workSpaces/javascript/arcblock/did-storage/packages/s3-driver/node_modules/@aws-sdk/lib-storage/dist-cjs/Upload.js:226:22)
2023-01-03T02:14:46:     at async Upload.done (/Users/skypesky/workSpaces/javascript/arcblock/did-storage/packages/s3-driver/node_modules/@aws-sdk/lib-storage/dist-cjs/Upload.js:39:16)

Expected Behavior

I hope it was a successful upload

Possible Solution

No response

Additional Information/Context

S3_REGION=ap-northeast-1

related: https://github.com/aws/aws-sdk-js-v3/issues/2673

yenfryherrerafeliz commented 1 year ago

Hi @skypesky, thanks for opening this issue. I can confirm this is a bug. Seems like the exception that we get is caused by the checksum being provided, which is being sent along with each part and this checksum was calculated for the whole file content, and it needs to be calculated just for the chunk of data sent for that specific part of the file. I can also confirm that the workaround proposed here works fine, but you should remove the md5 parameter from your code. I will mark this issue for review so we can address it further.

Repro steps: Installed the following packages:

yarn add @aws-sdk/client-s3
yarn add @aws-sdk/lib-storage

I used the following code:

import {
    S3Client
} from "@aws-sdk/client-s3";
import {
    Upload
} from "@aws-sdk/lib-storage";
import * as crypto from "crypto";

const client = new S3Client({
    region: 'us-east-2'
});
const body = '#'.repeat(1024 * 1024 * 31);
const md5 = crypto.createHash("MD5").update(body).digest("base64");
const upload = new Upload({
    client: client,
    params: {
        Bucket: process.env.TEST_BUCKET,
        Key: process.env.TEST_KEY,
        Body: body,
        ContentMD5: md5,
        Metadata: {
            'x-hash': md5,
        },
    },
});
const response = await upload.done();

console.log(response);

Thanks!

skypesky commented 1 year ago

@yenfryherrerafeliz

Thank you very much for your reply. I have a question, after this bug is fixed, will ContentMD5 finally fill in the md5 of the entire file?

yenfryherrerafeliz commented 1 year ago

@skypesky, I do not have a final picture about how it would be, but, according to the documentation each upload part command needs to sent a checksum based in the data sent in that part specifically.

Thanks!

andyslack commented 10 months ago

I can confirm we are experiencing the same here, it works perfectly on smaller files 1-2mb but as soon as you send a larger file it spits out the XML error. Watching for the final solution so we can update our code.

itzcull commented 8 months ago

@andyslack what are you doing in the meantime to circumvent this issue?