Agoric / agoric-sdk

monorepo for the Agoric Javascript smart contract platform
Apache License 2.0
325 stars 205 forks source link

update to latest XS, with map-iterator GC fix (also native lockdown/harden/purify) #3889

Closed warner closed 2 years ago

warner commented 2 years ago

Our #3839 memory-growth problem appears to be caused by a bug in XS: https://github.com/Moddable-OpenSource/moddable/issues/701 . The Moddable crew fixed this in the recent https://github.com/Moddable-OpenSource/moddable/commit/e16330893e5cfd68799544b395496ef45a3191fd commit.

The task here is to update our XS dependency to this new commit.

This version might also include native support for lockdown, harden, and purify (the specific commit that claims this appears mislabled, so I'm not sure). I'm told that SES should tolerate these properties being already present, and will continue to build and install its own versions, but since we don't have XS/xsnap test coverage in the SES repo, any surprises will appear first in the agoric-sdk xsnap or SwingSet tests.

And indeed, once I update packages/xsnap/moddable to use the new version, I see the following xsnap unit test failures:

  boot-lockdown › bootstrap to SES lockdown

  Rejected promise returned by test. Reason:

  Error {
    message: 'Uncaught exception in SES lockdown worker: TypeError: visitProperties: delete return: no permission (strict mode)',
  }

  › runToIdle (file://src/xsnap.js:191:15)

  boot-lockdown › child compartment cannot access start powers

  Rejected promise returned by test. Reason:

  Error {
    message: 'Uncaught exception in child compartment cannot access start powers: TypeError: visitProperties: delete return: no permission (strict mode)',
  }

  › runToIdle (file://src/xsnap.js:191:15)

  boot-lockdown › TextDecoder under xsnap handles TypedArray and subarrays

  Rejected promise returned by test. Reason:

  Error {
    message: 'Uncaught exception in TextDecoder under xsnap handles TypedArray and subarrays: TypeError: visitProperties: delete return: no permission (strict mode)',
  }

  › runToIdle (file://src/xsnap.js:191:15)

  boot-lockdown › console - symbols

  Rejected promise returned by test. Reason:

  Error {
    message: 'Uncaught exception in console - symbols: TypeError: visitProperties: delete return: no permission (strict mode)',
  }

  › runToIdle (file://src/xsnap.js:191:15)

  boot-lockdown › SES deep stacks work on xsnap

  Rejected promise returned by test. Reason:

  Error {
    message: 'Uncaught exception in SES deep stacks work on xsnap: TypeError: visitProperties: delete return: no permission (strict mode)',
  }

  › runToIdle (file://src/xsnap.js:191:15)

  boot-lockdown › console - objects should include detail

  Rejected promise returned by test. Reason:

  Error {
    message: 'Uncaught exception in console - objects should include detail: TypeError: visitProperties: delete return: no permission (strict mode)',
  }

  › runToIdle (file://src/xsnap.js:191:15)

  ─

  6 tests failed

So once I qualify the memory-growth fix manually, we'll need to fix those problems, hopefully without resorting to a new version of SES.

cc @kriskowal

warner commented 2 years ago

I was able to temporarily work around the .return-deletion problem (which prevents all use of SES, not just the xsnap tests) by patching away the do-not-delete flag:

diff --git a/xs/sources/xsMapSet.c b/xs/sources/xsMapSet.c
index e0f8f70b..cf58cbbc 100644
--- a/xs/sources/xsMapSet.c
+++ b/xs/sources/xsMapSet.c
@@ -107,7 +107,7 @@ void fxBuildMapSet(txMachine* the)
    mxPush(mxIteratorPrototype);
    slot = fxLastProperty(the, fxNewObjectInstance(the));
    slot = fxNextHostFunctionProperty(the, slot, mxCallback(fx_MapIterator_prototype_next), 0, mxID(_next), XS_DONT_DELETE_FLAG | XS_DONT_ENUM_FLAG);
-   slot = fxNextHostFunctionProperty(the, slot, mxCallback(fx_MapIterator_prototype_return), 0, mxID(_return), XS_DONT_DELETE_FLAG | XS_DONT_ENUM_FLAG);
+   slot = fxNextHostFunctionProperty(the, slot, mxCallback(fx_MapIterator_prototype_return), 0, mxID(_return), XS_DONT_ENUM_FLAG);
    slot = fxNextStringXProperty(the, slot, "Map Iterator", mxID(_Symbol_toStringTag), XS_DONT_ENUM_FLAG | XS_DONT_SET_FLAG);
    mxPull(mxMapIteratorPrototype);

@@ -135,7 +135,7 @@ void fxBuildMapSet(txMachine* the)
    mxPush(mxIteratorPrototype);
    slot = fxLastProperty(the, fxNewObjectInstance(the));
    slot = fxNextHostFunctionProperty(the, slot, mxCallback(fx_SetIterator_prototype_next), 0, mxID(_next), XS_DONT_DELETE_FLAG | XS_DONT_ENUM_FLAG);
-   slot = fxNextHostFunctionProperty(the, slot, mxCallback(fx_SetIterator_prototype_return), 0, mxID(_return), XS_DONT_DELETE_FLAG | XS_DONT_ENUM_FLAG);
+   slot = fxNextHostFunctionProperty(the, slot, mxCallback(fx_SetIterator_prototype_return), 0, mxID(_return), XS_DONT_ENUM_FLAG);
    slot = fxNextStringXProperty(the, slot, "Set Iterator", mxID(_Symbol_toStringTag), XS_DONT_ENUM_FLAG | XS_DONT_SET_FLAG);
    mxPull(mxSetIteratorPrototype);

With that in place, the memory growth problem seems to be resolved by this version of XS.

warner commented 2 years ago

To characterize the previous problem, I instrumented XS to print the->currentHeapCount after each GC, and ran the following little benchmark:

const s = new Set();
let a;
for (let i=0; i<100; i++) {
  a = {};
  s.add(a);
  s.delete(a);
  gc();
}

and it showed currentHeapCount growing by one slot per iteration. Doing the same with a Map:

const m = new Map();
let a,b,c;
for (let i=0; i<100; i++) {
  a = {}; b = {}; c = { b };
  m.set(a, c);
  m.delete(a);
  gc();
}

grows by two slots per loop. As Peter explained to me, the bug is not that the keys or values were being retained (a/b/c are all dropped, and the growth would have been much larger if the actual objects were retained). Instead, each time something is deleted from the Map/Set, the key/value is replaced by a special slot which is marked as "not really here" and has the same value as an undefined would have. This is skipped when iterating with s.entries() or m.keys() or m.values(), but still takes up space. And it wasn't marked as unreachable, so GC doesn't free it. The heap just keeps growing and growing, unless the Map or Set is deleted entirely. The undefined placeholder was used to avoid problems during iteration, but iterators aren't necessary to provoke the memory leak.

The new code has a much more complete implementation and appears to drop the placeholder values as soon as the number of iterators on the Map/Set drops to zero. This reveals a resource attack (which, to be fair, has been around all the time), in which the attacker obtains an iterator on your long-lived high-traffic Map and then never lets it go. If they never give it up, you can say goodbye to your ability to keep your memory footprint low, which can hurt you. We should keep this in mind for exposing iterators through the marshaller and swingset.

erights commented 2 years ago

We do plan to eventually expose async iterators through marshal. But not JS Maps and Sets, and thus likely not their perpetual iterators. The iteration semantics for stores is much more modest and should not create weird storage obligations. More on this later.

warner commented 2 years ago

NB the code I described above has been replaced again: it no longer tracks how many iterators are present, and holding onto an iterator no longer inhibits collection of deleted entries. The placeholder values are dropped as soon as the entry is deleted, however this means Map.prototype.delete() and Set.prototype.delete() are now O(N) in the number of entries that were present, rather than a hash-map's usual O(1)-ish behavior.