endojs / endo

Endo is a distributed secure JavaScript sandbox, based on SES
Apache License 2.0
829 stars 72 forks source link

daemon: Formula graph must not be mutated concurrently #2086

Closed rekmarks closed 7 months ago

rekmarks commented 8 months ago

Value incarnations, removals, and cancellations (collectively, "formula graph mutations") must be synchronized per #2092. This will necessitate refactoring a number of methods in host.js and (to some extent) mail.js that use these methods. Wherever possible, we should replace existing incarnateFoo() methods with their incarnateNumberedFoo() equivalents (again see #2092 for details).

What is the Problem Being Solved?

In the course of implementing #2079, it was discovered that two commands delivered serially to the daemon were fulfilled out of order, leading to unexpected behavior. In particular, the final await in this test code hangs indefinitely:

await E(host).makeUnconfined(
  'worker',
  capletLocation,
  'NONE',
  'context-consumer',
);

// `result` is a promise that resolves with 'cancelled' once the
// caplet is cancelled.
const result = E(host).evaluate(
  'worker',
  'E(caplet).awaitCancellation()',
  ['caplet'],
  ['context-consumer'],
);
await E(host).cancel('context-consumer');
t.is(await result, 'cancelled');

The test's endo.log revealed the cause:

Making make-unconfined:1f440...
Making least-authority:abc51...
Cancelled:
* make-unconfined:1f440...
Making eval:c9ae8...
Making make-unconfined:1f440...

Despite receiving the evaluate() request first and the cancel() request second, the host processes these requests in reverse order. Only by adding a delay before the final await was the desired behavior achieved.

Further examination revealed the cause to be several awaits within the host's evaluate(), before the eval formula's number had been added to the formula graph. The async operations within evaluate() created a race with cancel() that the latter would always win. Therefore, the resulting eval formula was never cancelled.

Consumers should never have to use delays / timeouts to achieve the desired order of operations when commanding the daemon. The root cause of the problem is that the formula graph was mutated concurrently, i.e. individual mutations did not complete within a single turn of the event loop. We must ensure that this never happens.

Description of the Design

Note: This may be out of date. Refer to #2092 instead.

We must preserve the invariant that the formula graph is never mutated concurrently. In other words, formula graph mutations should be mutually exclusive with each other, and completed in the order received. For the purpose of serializing formula graph mutations, we only care about the in-memory graph; we do not need to wait for the results to be written to disk, nor do we need to wait for the formula values to be reified. The following operations are subject to the invariant:

The simplest way to preserve the invariant would be to make all relevant functions synchronous. However, this may not always be possible. For instance, randomHex512 uses the callback signature of crypto.randomBytes to avoid blocking the event loop as it waits for the underlying primitives to collect enough entropy to generate a securely random number. This function is used pretty frequently, and the performance costs may therefore be prohibitive. Consequently, there may be an inescapably asynchronous component of formula graph mutation, around which we will probably put a mutex with a queue.

Finally, while discussing this problem with @kriskowal, we speculated that it may be helpful for consumers to be able to await formula graph mutation separately from the reification of associated values. For instance, if we have successfully preserved the invariant, but formula graph mutation remains async, it may make sense to make e.g. host.evaluate() return Promise<{ value: Promise<unknown> }>. The outer promise would resolve when formula graph mutation is completed, while the value property would resolve with the value when it becomes available. By wrapping the inner promise in an object, awaiting the outer promise wouldn't implicitly flatten the wrapped promises. This is an optional goal for the completion of this issue.

kumavis commented 8 months ago

them daemon

teehee