Agoric / agoric-sdk

monorepo for the Agoric Javascript smart contract platform
Apache License 2.0
327 stars 206 forks source link

Add warning for non durably unhandled promises #9771

Open mhofman opened 3 months ago

mhofman commented 3 months ago

What is the Problem Being Solved?

One kind of resumability bug is when subscribing without watching a promise in one incarnation and getting upgraded before that promise resolves. These issues are particularly pernicious as they currently result in a silent disconnection of the handler, with a lack of forward progress on resolution of the promise.

To be resumable, the vat should use watchPromise on the subscribed promise. While most code should call watchPromise by the end of the crank in which the promise was subscribed to, in practice vats do not get upgraded in the middle of swingset runs, and as such it's ok for a promise to not be watched if it gets resolved "promptly".

Note that not all subscribed promises at the liveslots layer are actually promises observed by the program: liveslots proactively subscribes to all imported promises (#6074 & #8469), and creates a subscribed promise for every message send even if using E.sendOnly (#3894).

Description of the Design

To diagnose occurrences of this issue triggering, we should change liveslots to warn if it receives a promise resolution for a promise it doesn't have in its map. This is likely a sign that a previous incarnation subscribed to the promise but we didn't have a watch registration for it. I believe there are currently other cases where this situation arises (false positives), and we should minimize those if possible.

To prevent this issue from arising, we should provide diagnostics to the author akin to unhandled promise rejections:

Finally liveslots should implement support for E.sendOnly and contracts should adopt where appropriate. We can likely ignore the case of dropped imported promises for now.

Security Considerations

None

Scaling Considerations

The lists should be kept durably to avoid unbounded heap growth (and to facilitate offline diagnostics)

Test Plan

TBD

Upgrade Considerations

Requires a chain software upgrade to upgrade liveslots, and vat upgrades to use the new liveslots version with these diagnostics.

mhofman commented 1 month ago

A case that isn't a bug but would appear as one when following the design above is when using the retriable tool (see #9541): promises inside the retried async function would not be watched, but the overall result of the retried async function likely depends on their resolution, and would itself be watched. This is a case where we're indirectly watching the resolution and have a mechanism to handle the disconnection on upgrade. In that case we're also unnecessarily leaving these subscribed but unwatched promises in the clist (see #10101)

See https://github.com/Agoric/agoric-sdk/pull/9719#discussion_r1763727074 for an example.

mhofman commented 1 month ago

I am wondering if something like AsyncContext would allow us to at least track which E result promises are associated with a specific retried flow, in order to silence the suspected cases of subscribed promises that are indirectly watched.

warner commented 2 weeks ago

We want this to provide a warning early enough that developers don't accidentally deploy a non-upgradable contract. It will require a bunch of kernel changes (to be aware of run boundaries) as well as new syscalls (which introduces old-vat compatibility questions). So it's a bit of a science project.