Closed yonatan closed 5 years ago
If it matters - the JSON file I'm trying to read contains Javascript ASTs, it looks something like this but without all the whitespace:
{
"type": "Program",
"start": 0,
"end": 476,
"body": [
{
"type": "VariableDeclaration",
"start": 179,
"end": 389,
"declarations": [
{
"type": "VariableDeclarator",
"start": 183,
"end": 388,
"id": {
"type": "Identifier",
"start": 183,
"end": 187,
"name": "tips"
},
I'm not totally surprised. The data structure for a JSON object is fairly big. Notable value strings are not cheap on a 64-bit machine as they reside on the stack with two guards, so a string of up to 7 characters takes 24 bytes + the pointer to it is 32 bytes. Integers take 8 bytes. Keys are shared, so it mainly depends on the number of different keys.
More importantly though, the dict is (still) created after parsing to the classical Prolog representation which is even more expensive, so the creation process takes even more memory. Finally, there is the choice of the system between garbage collecting and stack expansion that can easily temporary cause rather large stacks.
To know the real size, use term_size/2.
You can use the option value_string_as(atom)
to represent values as atoms rather than strings. If there are many duplicates, this may safe memory, but if most values are unique it will cost more. Also the difference between true
and "true"
is lost.
P.s. Please use the forum for such questions.
Hi,
Is it normal for json_read_dict to use 300MB RAM when parsing a 17MB json file? Or am I doing something wrong here?
Minimal test file: memtest.pl
Testing on Ubuntu 18.04 (note: /usr/bin/time is GNU time, not bash built-in time):