Closed vgjenks closed 12 hours ago
Using a big integer as min or max in
faker.string.numeric
in length options, is not handled gracefully and causesJavascript heap out of memory
crash in V8. It should at least validate the inputs and provide a graceful exception message.
The inputs are valid for faker, but they hit constraints of the underlying engine.
Any suggestion how to handle that?
What if the memory limit is increased? What if the memory is already spend on different strings/other objects? What if you use a different engine?
Previously, we limited the size arbitrarily (2 ** 20), but this also results in freeze like delays (https://github.com/faker-js/faker/commit/6f977f6b0122dc1761152d4eb7922fed63dc87c0).
I have ideas how to make it 4-7 times faster without loosing much randomness, but generating these long strings (or arrays) still takes a long time. If we use repeated sub-strings, then we still could go faster (like ~100-1000times?), but we would still hit the limit of the runtime eventually. Unfortunately these vary between systems:
JS Spec: 2^53-1 In V8 (used by Chrome and Node), the maximum length is 2^29 - 24 (~1GiB). On 32-bit systems, the maximum length is 2^28 - 16 (~512MiB). In Firefox, the maximum length is 2^30 - 2 (~2GiB). Before Firefox 65, the maximum length was 2^28 - 1 (~512MiB). In Safari, the maximum length is 2^31 - 1 (~4GiB).
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/length
We could limit the size again to like 2^25 to avoid the runtime limits:
min(length,limit)
value: We would secretly hide the limitation from the user, would In both scenarios we would limit the user arbritarly, even if their engine supports bigger values.
Just catching the out of memory error isn't an option either, because we would have to do that for every method then.
We could add a hint like If you generate more than 2 ^ 20 elements, you will likely run out of memory
, but that feels very random for a documentation.
Even better; provide big int support in options.
Could you explain why that would be better or for what purpose do you need bigInt support?
Could you explain why that would be better or for what purpose do you need bigInt support?
Sorry, admittedly...I misunderstood the usage of these options and made an assumption that "length" + "min" and "max" would generate a string between those numbers. It's a little misleading at first glance, to be fair.
I actually came here to delete this report and submit another to tend to the fact that it causes you to run out of memory. I think that's the crux of the issue here - it should handle it better and have some sort of safe limit - or even a reasonable, safe default with a configurable limit to allow you to blow your foot off, should you choose.
My bad for not reading the doc closely enough and making an assumption - bad form. However, I was testing this within the context of the aws-sdk
and it was frustrating to have it fail w/ zero feedback. I had to eliminate almost field-by-field to spot where it was happening, then test it in isolation to see it run out of memory and blow v8 up.
Sorry, admittedly...I misunderstood the usage of these options and made an assumption that "length" + "min" and "max" would generate a string between those numbers. It's a little misleading at first glance, to be fair.
We had hoped that a link to the method generating a number would solve that issue.
Do you have a suggestion on how to improve this?
I actually came here to delete this report and submit another to tend to the fact that it causes you to run out of memory. I think that's the crux of the issue here - it should handle it better and have some sort of safe limit - or even a reasonable, safe default with a configurable limit to allow you to blow your foot off, should you choose.
Even if we limit it to 2^20, then you are still limited to ~1k of those strings on the heap. Less if we limit it to 2^25 etc. Having a configurable length limit might be solution against accidental misuse. But it would affect all methods, returning strings, arrays and objects. We will discuss this in the next team-meeting. It might take a while for it to get implemented, probably after #2667
My bad for not reading the doc closely enough and making an assumption - bad form. However, I was testing this within the context of the
aws-sdk
and it was frustrating to have it fail w/ zero feedback. I had to eliminate almost field-by-field to spot where it was happening, then test it in isolation to see it run out of memory and blow v8 up.
I understand how frustrating this can be. Sadly node does a very poor job, logging where the OOM error occurred. With a little stacktrace that would be easier.
For the future, I would like to recommend binary search for the error location or some kind of debug logging, so it is easier to detect where the code halts.
The issue was first about Crashes app suddenly without error
.
Was that actually no error message or did you run into a timeout first?
I would like to understand whether we have to improve the implementation (for performance) or why there was no error message.
"The number or range of digits to generate." is perhaps a little hard to read
Maybe
"The length of the generated string. Should either be a fixed length, or a min and max length.
FFR: This needs a decision regarding introducing a configurable limit on data (string,array,...) length.
I think the issue is that if a user passes
faker.string.numeric({ length: { min: 1000000000, max: 2000000000 }})
or
faker.string.numeric({ length: 1000000000}})
Do we think its more likely
faker.string.numeric({ length: 9 }})
How about, if a very large length is provided, we log a warning like:
> `faker.string.numeric({ length: { min: 1000000000, max: 2000000000 }})`
Warning: This will create a numeric string with length over 1 million which may be slow or cause out of memory errors. The length parameter controls the number of digits in the string, not the maximum value. To suppress this warning, pass `{allowLargeLength: true}`
Team Decision
I plan to tackle this soon.
What should be the limit for the number of elements? 10k? 100K?
Suggestions for the parameter name:
function methodForBigData(options: { length: NumberOrRange, bypassSafety?: true})
faker.module.methodForBigData({ length: 1_000_000, bypassSafety: true });
// vs
faker.bypassSafety = true;
faker.module.methodForBigData({ length: 1_000_000 });
You are generating huge datasets, this is slow and might cause OutOfMemory issues. Requested: {}, Limit: {}. If you want to bypass this limit, please specify the ___ option.
or
Do you really want to generate datasets this huge? This is slow and might cause OutOfMemory issues. Requested: {}, Limit: {}. If you want to bypass this limit, please specify the ___ option.
Partially blocked by #3216
Team Decision
We will not add an allowHuge
or similar flag.
JS's string.repeat
and other similar methods don't have an allowHuge
flag either.
The main performance impact is from the repeated nested method calls and not by the huge list itself.
We will try to improve the performance of methods generating strings though.
Pre-Checks
Describe the bug
Using a big integer as min or max in
faker.string.numeric
in length options, is not handled gracefully and causesJavascript heap out of memory
crash in V8. It should at least validate the inputs and provide a graceful exception message. Even better; provide big int support in options.Minimal reproduction code
Additional Context
No response
Environment Info
Which module system do you use?
Used Package Manager
npm