amazon-ion / ion-js

A JavaScript implementation of Amazon Ion.
http://amzn.github.io/ion-docs/
Apache License 2.0
353 stars 48 forks source link

ion-js crashes when parsing very large string. #728

Open everett1992 opened 2 years ago

everett1992 commented 2 years ago

ion-js crashes when parsing a very large string

$ npm ls ion-js
test@1.0.0 /home/ANT.AMAZON.COM/calebev/test
└── ion-js@4.3.0
$ node -v
v18.12.0
ion.load(JSON.stringify(Buffer.alloc(50_000_000).fill('a').toString('base64')))

$ node -p "require('ion-js').load(JSON.stringify(Buffer.alloc(50_000_000).fill('a').toString('base64')))"

<--- Last few GCs --->

[641427:0x560ab00]    21984 ms: Scavenge 2028.7 (2064.9) -> 2028.8 (2069.9) MB, 5.6 / 0.0 ms  (average mu = 0.198, current mu = 0.143) allocation failure;
[641427:0x560ab00]    21994 ms: Scavenge 2031.6 (2069.9) -> 2031.8 (2070.9) MB, 7.8 / 0.0 ms  (average mu = 0.198, current mu = 0.143) allocation failure;
[641427:0x560ab00]    22003 ms: Scavenge 2032.6 (2070.9) -> 2032.4 (2081.6) MB, 8.7 / 0.0 ms  (average mu = 0.198, current mu = 0.143) allocation failure;

<--- JS stacktrace --->

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 0xb6e500 node::Abort() [/home/ANT.AMAZON.COM/calebev/.asdf/installs/nodejs/lts-hydrogen/bin/node]
 2: 0xa7e632  [/home/ANT.AMAZON.COM/calebev/.asdf/installs/nodejs/lts-hydrogen/bin/node]
 3: 0xd47ea0 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [/home/ANT.AMAZON.COM/calebev/.asdf/installs/nodejs/lts-hydrogen/bin/node]
 4: 0xd48247 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/home/ANT.AMAZON.COM/calebev/.asdf/installs/nodejs/lts-hydrogen/bin/node]
 5: 0xf25605  [/home/ANT.AMAZON.COM/calebev/.asdf/installs/nodejs/lts-hydrogen/bin/node]
 6: 0xf26508 v8::internal::Heap::RecomputeLimits(v8::internal::GarbageCollector) [/home/ANT.AMAZON.COM/calebev/.asdf/installs/nodejs/lts-hydrogen/bin/node]
 7: 0xf36a13  [/home/ANT.AMAZON.COM/calebev/.asdf/installs/nodejs/lts-hydrogen/bin/node]
 8: 0xf37888 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/home/ANT.AMAZON.COM/calebev/.asdf/installs/nodejs/lts-hydrogen/bin/node]
 9: 0xf3aa55 v8::internal::Heap::HandleGCRequest() [/home/ANT.AMAZON.COM/calebev/.asdf/installs/nodejs/lts-hydrogen/bin/node]
10: 0xeb8baf v8::internal::StackGuard::HandleInterrupts() [/home/ANT.AMAZON.COM/calebev/.asdf/installs/nodejs/lts-hydrogen/bin/node]
11: 0x12b7645 v8::internal::Runtime_StackGuard(int, unsigned long*, v8::internal::Isolate*) [/home/ANT.AMAZON.COM/calebev/.asdf/installs/nodejs/lts-hydrogen/bin/node]
12: 0x16e9979  [/home/ANT.AMAZON.COM/calebev/.asdf/installs/nodejs/lts-hydrogen/bin/node]

JSON.parse parses the string fine.


I'm implementing a npm registry server and the npm publish api POST's a JSON object with the package tarball base64 encoded as a string. Parsing this body with regular JSON.parse causes multiple copies of data

const body = await buffer(request) // Buffer
const str = body.toString('utf8') // String
const  json = JSON.parse(str) // JSON
const tarball = Buffer.from(json._attachments["1.0.0"].data, 'base64') // tarball

I tested ion-js to see if the reader would save memory.

zslayton commented 1 year ago

Just checking my understanding of your example:

require('ion-js')
    .load(JSON.stringify(Buffer.alloc(50_000_000).fill('a').toString('base64')))

We're creating a 50MB buffer filled with the ASCII character a, then converting that binary buffer to a base64-encoded string which will be about 67MB long. The resulting string will not be valid Ion syntax.

For example:

{{ImhlbGxvIHdvcmxkIg==}} // Text Ion blob
"ImhlbGxvIHdvcmxkIg=="   // Text Ion string

ImhlbGxvIHdvcmxkIg==      // Free-standing base64 data, not a valid Ion value 

There is indeed an OOM occurring, but I wanted to see whether this correctness issue affects your use case before diving too deeply on the memory management underlying especially large scalar values.

everett1992 commented 1 year ago

The base64 string is passed to JSON.stringify so it will be a valid ion string. Ignore Buffer / base64, it can be reproduced with

ion.load(JSON.stringify("a".repeat(50_000_000)))

I can also reproduce by feeding the output of dumpText back to load

const value = "a".repeat(3125000)
const ionText = ion.dumpText(value)
ion.load(ionText) // crashes here

Tho with larger strings dumpText will also crash

const value = "a".repeat(25_000_000)
const ionText = ion.dumpText(value) // crashes here