Open sunggook opened 7 years ago
Do you have a repro test case? That would be helpful in understanding this report.
@sunggook, could you share a number for JSON.stringify(xxxx).length
? In other words, what is the average length of a string that you create there?
I think it is less than 50 chars. it compares the two objects through JSON.stringify and the comparison function is usually called around 450 times per minute. Fragmentation increased in the xbox background music play around 20 M in 1.5 hours, which crashed the app due to xbox background music memory limit. I don't believe JSON.stringify is sole culprit of this fragmentation issue, but heard (our partner app) that when removing JSON.stringify call in the background, the play could go to 4 hours before it crash.
Not my issue but I thought I'd try and be helpful, two test cases that achieve the same thing in ch, one with JSON.stringify one without.
Running these in ch method 1 climbs to around 215mb of memory being used once it's been running a few seconds - though once there it doesn't go higher, method 2 uses around 15mb.
//method one
let foo = [
{x:5,y:10,z:20},
{x:5,y:10,z:20},
{x:6,y:10,z:25},
{x:6,y:11,z:20},
{x:7,y:10,z:20}
];
for (let i = 0; i < 10000000; ++i)
{
let j = i % 5;
let k = (i + i + 1) % 5;
console.log(
JSON.stringify(foo[j]) === JSON.stringify(foo[k])
);
}
//method two
let foo = [
{x:5,y:10,z:20},
{x:5,y:10,z:20},
{x:6,y:10,z:25},
{x:6,y:11,z:20},
{x:7,y:10,z:20}
];
for (let i = 0; i < 10000000; ++i)
{
let j = i % 5;
let k = (i + i + 1) % 5;
console.log(
compareObj(foo[j], foo[k])
);
}
function compareObj(a, b)
{
let identical = true;
for(let i in a)
{
if(a[i] !== b[i])
{
identical = false;
break;
}
}
return identical;
}
My guess would be that the GC doesn't start collecting all those unneeded strings until reaching an internal threshold that may be too high for some purposes.
That said if the only need is to compare some objects - method 2 here is a far less processor and memory intensive path. @sunggook thoughts?
@obastemur for fyi. method 1 runs very slow on chakra compared to v8.
We have seen code like Method 1 used on some important applications and web sites that need to run in memory constrained environments. Because it is not easy to ask every application or web site to be updated, we'd prefer for Chakra to spike memory much less than it does today when this type of code occurs.
method 2 doesn't repro.
v8
3.71 real 3.65 user 0.03 sys
23810048 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
4307 page reclaims
1553 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
0 signals received
6 voluntary context switches
240 involuntary context switches
ChakraCore Release/1.7
3.96 real 3.94 user 0.01 sys
26386432 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
6548 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
0 signals received
2 voluntary context switches
222 involuntary context switches
method 1.. Chakra runs 3 times slower and consumes 10 times more memory on release/1.7. Release/1.6 is even worst.
v8
8.49 real 8.35 user 0.05 sys
25620480 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
6265 page reclaims
35 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
0 signals received
0 voluntary context switches
3779 involuntary context switches
ChakraCore Release/1.7
24.62 real 25.75 user 0.14 sys
240955392 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
60212 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
0 signals received
2 voluntary context switches
1386 involuntary context switches
@MikeHolman heard that you work on concat strings. You may want to take this issue?
@obastemur Method 2 was not intended to be slow - that was meant to show a piece of code that had the same real world end effect just didn't use JSON.stringify as something a) to compare Method 1 against and b) as a possible alternative an end user such as @sunggook could work with in simple cases if Method 1's performance/memory use was a problem.
Though comparing Method 1 performance in CC vs v8 is obviously a much better test of CC JSON.stringify performance.
Yes, I am looking at this now. Basically I'm prototyping a 2 pass approach, where we will get the property values and string length in the first pass, and then in the second pass fill a buffer with the string, without needing to create so many intermediate objects.
@MikeHolman IIRC; I had tried a very cheap memstream there (no need for the second pass) yet the performance wasn't that good.
Some experiment while I'm pretending to be OOF;
I did replace https://github.com/Microsoft/ChakraCore/blob/master/lib/Runtime/Library/JSON.cpp#L1002-L1016 with result->Append(scriptContext->GetLibrary()->GetEmptyString());
This basically tells us that less is better and shows 25% better perf. However, no memory improvement!
Kept https://github.com/Microsoft/ChakraCore/blob/master/lib/Runtime/Library/JSON.cpp#L1002-L1016 intact. and https://github.com/Microsoft/ChakraCore/blob/master/lib/Runtime/Library/JSON.cpp#L1018 to isEmpty = true
..
Similar or better perf gain while no memory improvement anyways.
Finally.. see #4002 It also helps gaining 25% perf yet still no memory improvement.
so.. all these 25% gains are related to flattening the string ?
Curious.. What if we cache JSON.stringify of a particular dynamicObject until its' typeHandler was touched? (some additional stuff for objectArray...)
Could be beneficial for real life cases too. i.e. not every part of a big object is altered prior to re-JSON.stringify.
@MikeHolman @obastemur my observation also has been that in general, in Node scenarios, there's a ton of buffer copying going on when Node calls json stringify. @MikeHolman I think we could extend your approach to delay the second pass too, no? So that when you pass in a buffer that needs to be filled with the string, we could pass in either a utf8 or utf16 buffer and encode accordingly? That way, in the case of Node, we could go from JSON => utf8 string directly instead of the several hops it currently takes
Yes, absolutely. I was planning on only creating the buffer when you call GetString() on it. Seems like this should be extensible for the node case.
I timed test 1 above (removing the console.log), and my prototype takes 12 seconds, compared to 22 seconds for the existing implementation, so there is pretty massive perf benefit.
However, commit size doesn't change very much. I dug a bit, and it doesn't seem like fragmentation issue. It is almost all free pages. How many free pages are allowed is controlled by RecyclerHeuristic()
method here:
https://github.com/Microsoft/ChakraCore/blob/66b1da522168b2456ed7e76dd5221304de544b06/lib/Common/Memory/RecyclerHeuristic.cpp#L25
On devices with more than 1GB of RAM we allow hundreds of megabytes of free pages. I tested changing the heuristics to be those of a low memory device and I see usage vary from 20-50MB (builds to 50 and then pages get collected back down to 20).
I'm not sure how Xbox reports its memory to us, but the engine probably isn't aware of the 128MB watchdog, and if it is reporting the full system memory we are in for a world of pain. We probably need to add detection for Xbox and consider it to be a low memory device.
Instead of detecting a specific device, I’d rather see the ability to tell the runtime the process limit (ulimit) and use that instead of total free system RAM for these heuristics.
If the code call APIs often such as JSON.stringify, window.postMessage, or getBoundingClientRect in the background of the Xbox, the fragmentation quickly increased such that it reached the Xbox memory limit (128M) and the app crash.