chakra-core / ChakraCore

ChakraCore is an open source Javascript engine with a C API.
MIT License
9.11k stars 1.2k forks source link

Large fragmentation by using JSON.stringify, window.postMessage. #3752

Open sunggook opened 7 years ago

sunggook commented 7 years ago

If the code call APIs often such as JSON.stringify, window.postMessage, or getBoundingClientRect in the background of the Xbox, the fragmentation quickly increased such that it reached the Xbox memory limit (128M) and the app crash.

digitalinfinity commented 7 years ago

Do you have a repro test case? That would be helpful in understanding this report.

obastemur commented 7 years ago

@sunggook, could you share a number for JSON.stringify(xxxx).length ? In other words, what is the average length of a string that you create there?

sunggook commented 7 years ago

I think it is less than 50 chars. it compares the two objects through JSON.stringify and the comparison function is usually called around 450 times per minute. Fragmentation increased in the xbox background music play around 20 M in 1.5 hours, which crashed the app due to xbox background music memory limit. I don't believe JSON.stringify is sole culprit of this fragmentation issue, but heard (our partner app) that when removing JSON.stringify call in the background, the play could go to 4 hours before it crash.

rhuanjl commented 6 years ago

Not my issue but I thought I'd try and be helpful, two test cases that achieve the same thing in ch, one with JSON.stringify one without.

Running these in ch method 1 climbs to around 215mb of memory being used once it's been running a few seconds - though once there it doesn't go higher, method 2 uses around 15mb.

//method one
let foo = [
    {x:5,y:10,z:20},
    {x:5,y:10,z:20},
    {x:6,y:10,z:25},
    {x:6,y:11,z:20},
    {x:7,y:10,z:20}
];

for (let i = 0; i < 10000000; ++i)
{   
    let j = i % 5;
    let k = (i + i + 1) % 5;
    console.log(
        JSON.stringify(foo[j]) === JSON.stringify(foo[k])
    );
}
//method two
let foo = [
    {x:5,y:10,z:20},
    {x:5,y:10,z:20},
    {x:6,y:10,z:25},
    {x:6,y:11,z:20},
    {x:7,y:10,z:20}
];

for (let i = 0; i < 10000000; ++i)
{   
    let j = i % 5;
    let k = (i + i + 1) % 5;
    console.log(
        compareObj(foo[j], foo[k])
    );
}

function compareObj(a, b)
{
    let identical = true;
    for(let i in a)
    {
        if(a[i] !== b[i])
        {
            identical = false;
            break;
        }
    }
    return identical;
}

My guess would be that the GC doesn't start collecting all those unneeded strings until reaching an internal threshold that may be too high for some purposes.

That said if the only need is to compare some objects - method 2 here is a far less processor and memory intensive path. @sunggook thoughts?

akroshg commented 6 years ago

@obastemur for fyi. method 1 runs very slow on chakra compared to v8.

toddreifsteck commented 6 years ago

We have seen code like Method 1 used on some important applications and web sites that need to run in memory constrained environments. Because it is not easy to ask every application or web site to be updated, we'd prefer for Chakra to spike memory much less than it does today when this type of code occurs.

obastemur commented 6 years ago

method 2 doesn't repro.

v8

        3.71 real         3.65 user         0.03 sys
  23810048  maximum resident set size
         0  average shared memory size
         0  average unshared data size
         0  average unshared stack size
      4307  page reclaims
      1553  page faults
         0  swaps
         0  block input operations
         0  block output operations
         0  messages sent
         0  messages received
         0  signals received
         6  voluntary context switches
       240  involuntary context switches

ChakraCore Release/1.7

        3.96 real         3.94 user         0.01 sys
  26386432  maximum resident set size
         0  average shared memory size
         0  average unshared data size
         0  average unshared stack size
      6548  page reclaims
         0  page faults
         0  swaps
         0  block input operations
         0  block output operations
         0  messages sent
         0  messages received
         0  signals received
         2  voluntary context switches
       222  involuntary context switches

method 1.. Chakra runs 3 times slower and consumes 10 times more memory on release/1.7. Release/1.6 is even worst.

v8

        8.49 real         8.35 user         0.05 sys
  25620480  maximum resident set size
         0  average shared memory size
         0  average unshared data size
         0  average unshared stack size
      6265  page reclaims
        35  page faults
         0  swaps
         0  block input operations
         0  block output operations
         0  messages sent
         0  messages received
         0  signals received
         0  voluntary context switches
      3779  involuntary context switches

ChakraCore Release/1.7

       24.62 real        25.75 user         0.14 sys
 240955392  maximum resident set size
         0  average shared memory size
         0  average unshared data size
         0  average unshared stack size
     60212  page reclaims
         0  page faults
         0  swaps
         0  block input operations
         0  block output operations
         0  messages sent
         0  messages received
         0  signals received
         2  voluntary context switches
      1386  involuntary context switches
obastemur commented 6 years ago

@MikeHolman heard that you work on concat strings. You may want to take this issue?

rhuanjl commented 6 years ago

@obastemur Method 2 was not intended to be slow - that was meant to show a piece of code that had the same real world end effect just didn't use JSON.stringify as something a) to compare Method 1 against and b) as a possible alternative an end user such as @sunggook could work with in simple cases if Method 1's performance/memory use was a problem.

Though comparing Method 1 performance in CC vs v8 is obviously a much better test of CC JSON.stringify performance.

MikeHolman commented 6 years ago

Yes, I am looking at this now. Basically I'm prototyping a 2 pass approach, where we will get the property values and string length in the first pass, and then in the second pass fill a buffer with the string, without needing to create so many intermediate objects.

obastemur commented 6 years ago

@MikeHolman IIRC; I had tried a very cheap memstream there (no need for the second pass) yet the performance wasn't that good.

Some experiment while I'm pretending to be OOF;

I did replace https://github.com/Microsoft/ChakraCore/blob/master/lib/Runtime/Library/JSON.cpp#L1002-L1016 with result->Append(scriptContext->GetLibrary()->GetEmptyString());

This basically tells us that less is better and shows 25% better perf. However, no memory improvement!

Kept https://github.com/Microsoft/ChakraCore/blob/master/lib/Runtime/Library/JSON.cpp#L1002-L1016 intact. and https://github.com/Microsoft/ChakraCore/blob/master/lib/Runtime/Library/JSON.cpp#L1018 to isEmpty = true ..

Similar or better perf gain while no memory improvement anyways.

Finally.. see #4002 It also helps gaining 25% perf yet still no memory improvement.

so.. all these 25% gains are related to flattening the string ?

Curious.. What if we cache JSON.stringify of a particular dynamicObject until its' typeHandler was touched? (some additional stuff for objectArray...)

Could be beneficial for real life cases too. i.e. not every part of a big object is altered prior to re-JSON.stringify.

digitalinfinity commented 6 years ago

@MikeHolman @obastemur my observation also has been that in general, in Node scenarios, there's a ton of buffer copying going on when Node calls json stringify. @MikeHolman I think we could extend your approach to delay the second pass too, no? So that when you pass in a buffer that needs to be filled with the string, we could pass in either a utf8 or utf16 buffer and encode accordingly? That way, in the case of Node, we could go from JSON => utf8 string directly instead of the several hops it currently takes

MikeHolman commented 6 years ago

Yes, absolutely. I was planning on only creating the buffer when you call GetString() on it. Seems like this should be extensible for the node case.

MikeHolman commented 6 years ago

I timed test 1 above (removing the console.log), and my prototype takes 12 seconds, compared to 22 seconds for the existing implementation, so there is pretty massive perf benefit.

However, commit size doesn't change very much. I dug a bit, and it doesn't seem like fragmentation issue. It is almost all free pages. How many free pages are allowed is controlled by RecyclerHeuristic() method here: https://github.com/Microsoft/ChakraCore/blob/66b1da522168b2456ed7e76dd5221304de544b06/lib/Common/Memory/RecyclerHeuristic.cpp#L25

On devices with more than 1GB of RAM we allow hundreds of megabytes of free pages. I tested changing the heuristics to be those of a low memory device and I see usage vary from 20-50MB (builds to 50 and then pages get collected back down to 20).

I'm not sure how Xbox reports its memory to us, but the engine probably isn't aware of the 128MB watchdog, and if it is reporting the full system memory we are in for a world of pain. We probably need to add detection for Xbox and consider it to be a low memory device.

matthargett commented 6 years ago

Instead of detecting a specific device, I’d rather see the ability to tell the runtime the process limit (ulimit) and use that instead of total free system RAM for these heuristics.