Open riddhi123 opened 6 days ago
@riddhi123 Two quick questions:
Also could you try v1.6.2 and see if that has the issue?
My guess is it is something to do with v1.7.0 that had a lot of dependency updates and the issue is somewhere in there.
@riddhi123 Two quick questions:
- Which version of the library were you on when it was working?
- Which library version are you using now?
v1.6.2
same error with v1.6.2
@riddhi123 Hmm... Lots of changes since v1.2.1. If it is easy could you test v1.5.0
? 1.6 did some major updates to support v3 of the AWS sdk. I suspect that's where the issue appeared.
@wilwade with v1.5.0 on node 16 getting same error and on node 20 getting error as client.getObject is not a function\n at ParquetEnvelopeReader.readFn (/opt/nodejs/node_modules/@dsnp/parquetjs/dist/lib/reader.js:392:36)\n as v1.5.0 not supporting v3 changes
@riddhi123 I can't reproduce your error with our test files. Can you please post a link to the test file you have that's failing? Meanwhile I will attempt to create a test file like the one you've described. I've also modified your test script so it doesn't overwrite the parquet file with a gzipped CSV so I can keep rerunning the test.
@riddhi123 Have you tried increasing memory allocation with --max-old-space-size
? I was able to create a snappy-compressed parquet file of over 40 MB and uploaded it to an S3 Bucket. I then ran your download & convert code above, which succeeded. I downloaded the resulting gzipped file and it uncompressed fine and has expected CSV content. So we will need your test file to try to reproduce the error you are seeing. Below is the end of the output of this script, and a screenshot showing the two files on an S3 bucket, the gz file is the output of your stringify/zip line.
...
{
loaded: 32129742,
total: undefined,
part: 7,
Key: 'testBig.snappy.parquet.gz',
Bucket: 'my-bucket'
}
uploadResponse:: {
'$metadata': {
httpStatusCode: 200,
requestId: '55ZAJFKW48QVA811',
extendedRequestId: '8FqKwastV2ZoTRCAjU5kNseMlmlf+XZmvQ2XYuukVhOFzlSGZ9gifYcYkT0ppjGYpMOVflDB8iM=',
cfId: undefined,
attempts: 1,
totalRetryDelay: 0
},
ServerSideEncryption: 'AES256',
Bucket: 'my-bucket',
ETag: '"92a4f2c0ff33b6eadeadbeefdbdbdbdbb-7"',
Key: 'testBig.snappy.parquet.gz',
Location: 'https://my-bucket.s3.cn-east-99.amazonaws.com/testBig.snappy.parquet.gz'
}
I am trying to read snappy.parquet file using let reader = await ParquetReader.openS3(s3, params);
and then uploading same file as csv.gz on s3 using below :
Earlier (May 2023), code working fine with >=100MB snappy.parquet file on node 16 version but code moved on version 20 and now code is giving below error for 34MB file on node both version 16 + 20 :
<--- Last few GCs --->
[21836:00000131C04D35E0] 169720 ms: Scavenge 2042.1 (2048.8) -> 2041.3 (2048.5) MB, 1.5 / 0.0 ms (average mu = 0.170, current mu = 0.131) allocation failure; [21836:00000131C04D35E0] 169725 ms: Scavenge 2042.9 (2049.5) -> 2041.7 (2051.0) MB, 1.2 / 0.0 ms (average mu = 0.170, current mu = 0.131) allocation failure; [21836:00000131C04D35E0] 169731 ms: Scavenge 2044.3 (2052.9) -> 2042.2 (2051.5) MB, 1.3 / 0.0 ms (average mu = 0.170, current mu = 0.131) allocation failure;
<--- JS stacktrace --->
FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory 1: 00007FF74150234F node_api_throw_syntax_error+179983 2: 00007FF741486986 v8::internal::MicrotaskQueue::GetMicrotasksScopeDepth+61942 3: 00007FF741488693 v8::internal::MicrotaskQueue::GetMicrotasksScopeDepth+69379 4: 00007FF741FC6411 v8::Isolate::ReportExternalAllocationLimitReached+65 5: 00007FF741FB1066 v8::internal::V8::FatalProcessOutOfMemory+662 6: 00007FF741E17770 v8::internal::EmbedderStackStateScope::ExplicitScopeForTesting+144
7: 00007FF741E24172 v8::internal::Heap::PublishPendingAllocations+1106 8: 00007FF741E21963 v8::internal::Heap::PageFlagsAreConsistent+3171 9: 00007FF741E13FA3 v8::internal::Heap::CollectGarbage+2723 10: 00007FF741E1C2AA v8::internal::Heap::GlobalSizeOfObjects+266 11: 00007FF741E6C80F v8::internal::StackGuard::HandleInterrupts+879 12: 00007FF741AF0F56 v8::internal::Runtime::SetObjectProperty+26918 13: 00007FF74206FA61 v8::internal::SetupIsolateDelegate::SetupHeap+606705 14: 00007FF6C2372D0A
what could be the issue ??