Closed al6x closed 1 year ago
The solution would be on every step (or every 100th step) to spider the interpreter's stateStack property to compute its size. This is a little tricky since there are circular references, so just add a .mark_
property to each object as you see it, and ignore any object with a .mark_
property. Here's some code:
function size(node) {
if (node && typeof node == 'object') {
if (node.mark_) {
return 0;
}
var bytes = 0;
var names = Object.getOwnPropertyNames(node);
node.mark_ = true;
for (var i = 0, name; name = names[i]; i++) {
bytes += size(node[name]);
}
return bytes;
} else if (node !== undefined && node !== null) {
return node.toString().length;
}
return 0;
}
size(myInterpreter.stateStack);
Don't forget to delete all the mark_
properties before running size again.
Huge thanks! :)
FYI, the easy way to do this now is to just serialize the statestack:
<script src="demos/serialize.js"></script>
var json = serialize(myInterpreter);
var bytes = JSON.stringify(json).length;
Nice, thanks! :)
A quick follow-up to this post.
I encounter the same need but JSON.stringify
and demo/serialize.js
are too slow, and memory bomb may take less that 100 steps.
I end up with the following memory probe (10x faster than JSON.stringify):
function roughValueMemorySize(value) {
const pool = new Set();
const stack = [value];
let size = 0;
while (stack.length) {
const val = stack.pop();
const type = typeof val;
size += 8; // rough
if (type === 'string' && !pool.has(val)) {
pool.add(val); // see v8 string constant pool
size += val.length * 2;
}
else if (type === 'object' && val !== null && !pool.has(val)) {
pool.add(val);
stack.push(...Object.keys(val), ...Object.values(val));
}
}
return size;
}
use:
roughValueMemorySize(interpreter.getStateStack());
The result is clearly inaccurate, but it give an approximate overview of memory usage.
Please tell me if I have forgotten to take into consideration some points.
@FranckFreiburger: nice bit of code, and useful to know it is substantially faster than JSON.stringify
.
Please tell me if I have forgotten to take into consideration some points.
For use with JS Interpreter I think this looks pretty good. I'm particularly pleased to see you are counting the sizes of the keys as well as the values; that is easy to overlook. If you were being really thorough, you might also want to:
Object.getPrototypeOf()
—though you will probably want to exclude build-in like Object.prototype
from consideration.String
objects. Unfortunately, it's hard to know how to measure the size of some internal slots: for example, while it's easy enough to measure the size of a RegExp
's [[OriginalSource]] and [[OriginalFlags]] slots (they can be accessed via methods on RegExp.prototype
), the [[RegExpRecord]] and [[RegExpMatcher]] slots are basically opaque magic.Not applicable to JS Interpreter since it does not use any of these features, but if one was trying to write a more general implementation then one would additionally with to measure:
Symbol
-keyed properties and the sizes of the objects accessible via them.#private
properties—though alas this cannot be done without support from the class that defines them, and even if every class defined a standard method to access private properties for this purpose there would still be the difficulty of determining which classes to interrogate, since a class can create private properties on objects that are not instances of that class (so you don't only need to query the classes that are accessible via the object's prototype chain).But we are getting awfully far into the weeds. The code you have written covers pretty much everything relevant to this use-case except non-builtin object in the object's prototype chain.
This is a good idea. I also agree that approximate is absolutely good enough for this purpose.
One minor issue as that using a variable named 'stack' is not great in the context of the JS-Interpreter since it makes one assume that this is a copy of the execution stack.
A more serious issue is that the second stack.push line can throw Uncaught RangeError: too many function arguments
(Firefox) or RangeError: Maximum call stack size exceeded
(Chrome) if there's an array in the Interpreter with more than a million entries. For some JS engines the argument limit is just 65536. This may be resolved with stack = stack.concat(Object.keys(val), Object.values(val));
The issue I'm continuing to work on is why this algorithm completely locks up when measuring the size on the final lines of this code:
var bigString = '1 + 1;\n';
for (var i = 0; i < 20; i++) {
bigString += bigString;
}
var f = new Function(bigString);
Reopening this issue. We deserve a faster approach than serialize.
It's irritating that between steps, the overwhelming majority of objects in the interpreter are exactly the same, but we have to re-measure them from the ground up.
We deserve a faster approach than serialize.
I am not sure how to achieve "good" performance (i.e., being able to keep track of the runtime size incrementally, obviating the need to spider from the roots every time) without having access to information about what the garbage collector is doing. Fortunately WeakRef
—and more particularly FinalizationRegistry
—exists and can provide the needed information about what the GC is GCing, but those features were added much more recently and so are not compatible with "run JS Interpreter in JS Interpreter".
Maybe some kind of reference counting could provide a sufficient approximation for many use-cases? Memory usage could be tracked incrementally, with the flaw that failing to break circular structures before letting an object become unreachable would prevent it from being removed from your measured memory usage (but not prevent it from being GCed).
I just created a demo page (on a branch) to test the size calculations: https://github.com/NeilFraser/JS-Interpreter/blob/fraser-size/demos/size.html
What's interesting is that the JSON calculator takes 8 seconds, whereas the rough calculator takes 49 seconds. This is not consistent with the claimed 10x speedup. @FranckFreiburger did I do something wrong?
.concat(Object.keys(val), Object.values(val));
seems to be much more slower that .push (maybe because the array is recreated each time ?)
Yup, that made the difference. I've updated the demo page to use .push when possible, and only fall back to .concat when needed. It clocks in at about 50x the speed of JSON.
One regrettable thing is that this code relies on Set. And a null-prototype object won't do, since we are throwing all kinds of values into the set, not just strings. This function is firmly rooted in ES6, there's no way to . So I'll leave in the const/let, and the destructuring assignments. The main reason the serializer is so much slower is that it uses indexOf in an array, rather than using a set -- but in its defense, it needs the objects ordered so it can assign reference numbers to them.
As a result of this discussion, a Set is now being used (if it exists) to double the speed of JSON serialization: https://github.com/NeilFraser/JS-Interpreter/commit/a698f4fa82b3214068df7a5ab79e6338c247a405
@cpcallen You mentioned several blind spots. Are these just percentages on the existing count, or could a hacker throw gigabytes into one of those spots?
With respect to the [[StringValue]] slot of String objects, JSON serializing new String('Peekaboo!')
reveals "data":"Peekaboo!"
. And likewise with the rough count, I see the string 'Peekaboo!' getting scanned.
The main use case here is not making sure the program will fit within a 640 KB memory allocation. We're only really interested in stopping someone from bombing memory.
I've merged the current changes and pushed the new size demo live. Still want to follow up with @cpcallen to see if the blind spots are significant.
Is it protected against memory flooding? Like is it possible to set a memory limit?