aws / aws-sdk-js-v3

Modularized AWS SDK for JavaScript.
Apache License 2.0
2.98k stars 559 forks source link

Assertion parser->current_buffer_.IsEmpty() failed in Node when using Upload in @aws-sdk/lib-storage in cloud environment #2843

Closed mingchuno closed 6 days ago

mingchuno commented 2 years ago

Confirm by changing [ ] to [x] below to ensure that it's a bug:

Describe the bug This is actually an issue in https://github.com/nodejs/node/issues/39671

When using @aws-sdk/lib-storage to perform multipart upload to S3. NodeJs throw the following exception:

node[1]: ../src/node_http_parser.cc:510:static void node::{anonymous}::Parser::Execute(const v8::FunctionCallbackInfo<v8::Value>&): Assertion `parser->current_buffer_.IsEmpty()' failed.

This is from node/http module. Interestingly, this only happen on cloud env but not local dev env work fine. (Further investigation is needed to verify). It run fine almost every time in my local machine (Macbook Pro 16" Intel running node v14.17.5) with only one time I have seen the exception. But in the production env running EKS over EC2, it fail consistancly at the same point no matter the data size.

This MAY NOT be an issue of @aws-sdk/lib-storage but it seems to be only happen when using aws sdk together in cloud env.

I have tried changing my application logic to only perform 1 upload at a time and it did not crash even in cloud env (AWS EKS running on EC2). So I suspect it is a concurrent issue of some sort in either aws-sdk or nodejs level. It would be great if we (user, aws-sdk, nodejs) can work together and find a re-producer first.

Is the issue in the browser/Node.js? Node.js and possible only happens on cloud (AWS?) ENV only

If on Node.js, are you running this on AWS Lambda? No

Details of the browser/Node.js version

resources:
  limits:
    cpu: 1000m
    memory: 1024Mi
  requests:
    cpu: 1000m
    memory: 1024Mi

SDK version number

@aws-sdk/lib-storage@3.33.0 for me and I am not sure others

To Reproduce (observed behavior)

No reproducer at this moment. Looking for someone to reproduce it.

Expected behavior

No exception thrown

Screenshots

No

Additional context

Code snippet for reference

import { Upload } from '@aws-sdk/lib-storage'
import { PassThrough } from 'stream'

const multipartUpload = new Upload({
  client: s3Client,
  params: {
    Bucket: 'some-s3-bucket',
    Key: 'path/to/some.json',
    Body: inputStream.pipe(new PassThrough()),
    ContentType: 'application/octet-stream',
  },
})
await multipartUpload.done()
rcausey commented 2 years ago

We've also encountered this crash on AWS Lambda, and even when queueSize is set to 1. We've found it's much easier to reproduce with larger values of queueSize though.

mingchuno commented 2 years ago

@rcausey Can you provide more details on your use case? You use the lambda to upload multipart file to S3 too? What do queueSize mean? You are connecting SQS / AWS MQ to lambda?

rcausey commented 2 years ago

@rcausey Can you provide more details on your use case? You use the lambda to upload multipart file to S3 too? What do queueSize mean? You are connecting SQS / AWS MQ to lambda?

Yes, we use @aws-sdk/lib-storage to do multipart uploads to S3 on Lambda.

queueSize is a parameter that controls concurrency in @aws-sdk/lib-storage uploads; i.e.

const parallelUploads3 = new Upload({
  client: new S3({}) || new S3Client({}),
  tags: [...], // optional tags
  queueSize: 4, // optional concurrency configuration
  partSize: 5MB, // optional size of each part
  leavePartsOnError: false, // optional manually handle dropped parts
  params: target,
});

We first ran into this issue when we switched from v2 to the v3 SDK. We couldn't reproduce it when we set queueSize to 1, so this is how we "fixed" it, since this was sufficient for our needs. But we recently encountered the issue in our production environment even with queueSize set to 1, so it seems like it can still happen (albeit much less frequently) without concurrent uploads.

mingchuno commented 2 years ago

@rcausey Can we isolate a simple reproducer in cloud env that can (with high confident) reproduce the issue?

evgeny-roubinchtein commented 2 years ago

We are seeing this also. In our case, we have a Lambda handling a message from an SQS queue, and sometimes the handler dies with the error the OP described. I am pasting a stack trace from the CloudWatch log, but it looks like the node binary you are running our code under is stripped (not that it shouldn't have been, but it doesn't help debug the issue).

How can we assist you in getting to the bottom of this? For example, can you guide us in building a custom Node.js runtime in which the node binary isn't stripped but which is otherwise as close as we can get to the run-time you are using? It might be helpful to have a better stack trace, I am guessing...

Here's the stack, such as it is from a CloudWatch log for a run of Lambda where this issue hit:

/var/lang/bin/node[8]: ../src/node_http_parser.cc:510:static void node::{anonymous}::Parser::Execute(const v8::FunctionCallbackInfo<v8::Value>&): Assertion `parser->current_buffer_.IsEmpty()' failed.
1: 0x55913ada7673 node::Abort() [/var/lang/bin/node]
2: 0x55913ada7702  [/var/lang/bin/node]
3: 0x55913adc4750  [/var/lang/bin/node]
4: 0x55913afe3619  [/var/lang/bin/node]
5: 0x55913afe53a1 v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) [/var/lang/bin/node]
6: 0x55913b91a7d9  [/var/lang/bin/node]
[...]
RequestId: <id_elided> Error: Runtime exited with error: signal: aborted (core dumped)
Runtime.ExitError
upMKuhn commented 2 years ago

I encountered the error as well. Found a good hint that it can be avoided, by setting the chunk size. It said the error happens when its exactly 5MB per upload.

upMKuhn commented 2 years ago

https://github.com/nodejs/node/issues/39671#issuecomment-992693966

alexzuo2013 commented 2 years ago

Do we know the reason behind the chunksize hack?

On Tue, Jan 11, 2022 at 5:02 PM Martin Kuhn @.***> wrote:

nodejs/node#39671 (comment) https://github.com/nodejs/node/issues/39671#issuecomment-992693966

— Reply to this email directly, view it on GitHub https://github.com/aws/aws-sdk-js-v3/issues/2843#issuecomment-1010523684, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD6E3KTYPVFTJETY4Y7VYD3UVTHI3ANCNFSM5E6NCK3Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

upMKuhn commented 2 years ago

I think the assert is triggered when it's sending a Multipart Upload with at least 2 chunks and the body happens to be empty for some unknown reason. The scenario below is reproducing the error all the time. Weirdly enough, if I change the chunk size to 5.5 1024 1024 it has no error, but the log message indicates it is uploading the file twice? The file to upload is 7mb and the progress.loaded is 14mb after it finished


const file = await fs.open("myFile.json", 'r'); // file is 7mb
    const stream = file.createReadStream({ encoding: null }); // get byte stream
    await this.waitForReadable(stream);

await new Upload({
      client: this.s3Client,
      leavePartsOnError: false,
      tags: args.tags,
      params: {
        Key: this.fullPath(args.fileName),
        Bucket: this.bucketName,
        Body: stream,
      },
    }).done();
/Users/martinkuhn/.nvm/versions/node/v16.13.1/bin/node[76192]: ../src/node_http_parser.cc:517:static void node::(anonymous namespace)::Parser::Execute(const FunctionCallbackInfo<v8::Value> &): Assertion `parser->current_buffer_.IsEmpty()' failed.
 1: 0x107274815 node::Abort() (.cold.1) [/Users/martinkuhn/.nvm/versions/node/v16.13.1/bin/node]
 2: 0x105f73aa9 node::Abort() [/Users/martinkuhn/.nvm/versions/node/v16.13.1/bin/node]
 3: 0x105f738e1 node::Assert(node::AssertionInfo const&) [/Users/martinkuhn/.nvm/versions/node/v16.13.1/bin/node]
 4: 0x105f901ff node::(anonymous namespace)::Parser::Execute(v8::FunctionCallbackInfo<v8::Value> const&) [/Users/martinkuhn/.nvm/versions/node/v16.13.1/bin/node]
 5: 0x10615d239 v8::internal::FunctionCallbackArguments::Call(v8::internal::CallHandlerInfo) [/Users/martinkuhn/.nvm/versions/node/v16.13.1/bin/node]
 6: 0x10615cd06 v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) [/Users/martinkuhn/.nvm/versions/node/v16.13.1/bin/node]
 7: 0x10615c47f v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) [/Users/martinkuhn/.nvm/versions/node/v16.13.1/bin/node]
 8: 0x1069cd399 Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_BuiltinExit [/Users/martinkuhn/.nvm/versions/node/v16.13.1/bin/node]
^CWaiting for the debugger to disconnect...
Waiting for the debugger to disconnect...
upMKuhn commented 2 years ago

My bad the error still happens, it still seems to be random :O What I noticed while debugging the library is that an empty PUT request is sent when the stream ended. Although it should never be triggered? Will debug tonight a bit more and double check.

https://github.com/aws/aws-sdk-js-v3/blob/main/lib/lib-storage/src/Upload.ts line 136

        // Use put instead of multi-part for one chunk uploads.
        if (dataPart.partNumber === 1 && dataPart.lastPart) {
          return await this.__uploadUsingPut(dataPart);
        }
upMKuhn commented 2 years ago

Maybe its releated. I have called the done function twice when doing ´new Upload().done().done()´ This caused an empty stream being sent. Thus deleting the uploaded file again and perhaps a potential trigger for the Assert error.

upMKuhn commented 2 years ago

Yep, I haven't encountered any error since I made sure .done() is never called twice for the same upload object.

libre-man commented 2 years ago

I'm still encountering this issue. We've worked around it for now by creating a custom HttpHandler that uses undici. Would AWS be interested in such a handler to workaround this issue, or is there still active research going into why this issue is happening?

vsosevic commented 2 years ago

I could still reproduce this bug using '@aws-sdk/lib-storage' library, but this was not being reproduced using AWS-SDK of 2 version. Steps to reproduce - create ~1500 of 0-length files. In that case it would crash randomly - on 50th file, 250th file, 853d file, etc. So this bug is not stable.

If you put some data to each file - 8KB-5MB - this bug isn't reproducible.

Surprisingly enough it's working pretty stable without any bugs with AWS-SDK 2 ver.

pmiszczak-cksource commented 1 year ago

NodeJS v18.6.0 has a fix for nodejs/node#39671 (nodejs/node@491c7619c4)

RanVaknin commented 3 weeks ago

Hi there,

I see this issue never got addressed by anyone from the SDK team. The last comment was made in 2022 and suggested a fix from Node's side.

Is this still an issue?

Thanks, Ran~

github-actions[bot] commented 1 week ago

This issue has not received a response in 1 week. If you still think there is a problem, please leave a comment to avoid the issue from automatically closing.