Open mhofman opened 3 months ago
A case that isn't a bug but would appear as one when following the design above is when using the retriable
tool (see #9541): promises inside the retried async function would not be watched, but the overall result of the retried async function likely depends on their resolution, and would itself be watched. This is a case where we're indirectly watching the resolution and have a mechanism to handle the disconnection on upgrade. In that case we're also unnecessarily leaving these subscribed but unwatched promises in the clist (see #10101)
See https://github.com/Agoric/agoric-sdk/pull/9719#discussion_r1763727074 for an example.
I am wondering if something like AsyncContext would allow us to at least track which E result promises are associated with a specific retried flow, in order to silence the suspected cases of subscribed promises that are indirectly watched.
We want this to provide a warning early enough that developers don't accidentally deploy a non-upgradable contract. It will require a bunch of kernel changes (to be aware of run boundaries) as well as new syscalls (which introduces old-vat compatibility questions). So it's a bit of a science project.
What is the Problem Being Solved?
One kind of resumability bug is when subscribing without watching a promise in one incarnation and getting upgraded before that promise resolves. These issues are particularly pernicious as they currently result in a silent disconnection of the handler, with a lack of forward progress on resolution of the promise.
To be resumable, the vat should use
watchPromise
on the subscribed promise. While most code should callwatchPromise
by the end of the crank in which the promise was subscribed to, in practice vats do not get upgraded in the middle of swingset runs, and as such it's ok for a promise to not be watched if it gets resolved "promptly".Note that not all subscribed promises at the liveslots layer are actually promises observed by the program: liveslots proactively subscribes to all imported promises (#6074 & #8469), and creates a subscribed promise for every message send even if using
E.sendOnly
(#3894).Description of the Design
To diagnose occurrences of this issue triggering, we should change liveslots to warn if it receives a promise resolution for a promise it doesn't have in its map. This is likely a sign that a previous incarnation subscribed to the promise but we didn't have a watch registration for it. I believe there are currently other cases where this situation arises (false positives), and we should minimize those if possible.
To prevent this issue from arising, we should provide diagnostics to the author akin to unhandled promise rejections:
Finally liveslots should implement support for
E.sendOnly
and contracts should adopt where appropriate. We can likely ignore the case of dropped imported promises for now.Security Considerations
None
Scaling Considerations
The lists should be kept durably to avoid unbounded heap growth (and to facilitate offline diagnostics)
Test Plan
TBD
Upgrade Considerations
Requires a chain software upgrade to upgrade liveslots, and vat upgrades to use the new liveslots version with these diagnostics.