endojs / endo

Endo is a distributed secure JavaScript sandbox, based on SES
Apache License 2.0
804 stars 71 forks source link

fix(patterns): Tolerate old guard format #2038

Closed erights closed 7 months ago

erights commented 7 months ago

Staged on https://github.com/endojs/endo/pull/2039

closes: https://github.com/Agoric/agoric-sdk/issues/8826 refs: #2039 #1712 https://github.com/endojs/endo/pull/1745

Description

We currently believe the actual compat problem with https://github.com/Agoric/agoric-sdk/issues/8826 , or at least the problem that we actually need to fix, occurs when old contracts that were bundled with an endo preceding #1712 is used with a new liveslots, bundled with an endo including #1712 . The problem is that the old endo is producing old guards using the old klass format.

Fortunately, #1745 anticipated part of the compat issue that #1712 caused, by introducing get*GuardPayload functions, to encapsulate the actual guard representation so that other code using these guards could mostly consume them through these functions. It anticipated that we would switch from old to new guard formats, but not that they would need to co-exist across a version slippage boundary, like liveslots vs contracts.

This PR leverages the abstraction provided by #1745 to provide this version slippage tolerance only by modifying these getter functions to be tolerant of both formats on input, and to produce the corresponding modern payload as a result.

The nested guard adaptation provided by this PR is limited, for two reasons

Security Considerations

Technically, tolerating the old LegacyAwaitArgGuardShape is an exploitable bug, in that a record that matches this shape is also a valid parameter pattern that should allow an argument that matches that pattern, i.e., a copyRecord argument that at least contains a klass: 'awaitArgGuard' property. However, to be exploitable, the victim code would have had to accidentally write such a copyRecord pattern with a klass property and use it in as an arg guard. The chances of this happening accidentally are vanishingly small. Nevertheless, once we no longer need to support code preceding #1712 , we should remove all this extra complexity, and thus plug this in-theory exploitable bug.

Unlike LegacyAwaitArgGuardShape, tolerating the other legacy guard shapes does not seem like a currently exploitable bug, because there is

Scaling Considerations

I tried to code this so that the non-legacy success case was the fast path, and paid minimal cost for being adaptable to this legacy.

Documentation Considerations

Other than fixing a surprising vexing problem, I don't think developers need to be aware of this change at all. Since the current https://github.com/Agoric/agoric-sdk/issues/8826 problem is surprising, this PR relieves us of the need to explain that surprise.

Testing Considerations

This PR is staged on #2039 so that the changes this PR makes to test-legacy-guard-tolerance.js demonstrate the relevant differences in behavior caused by this PR.

Compatibility Considerations

The whole point of this PR is for new code that consumes guards via the get*GuardPayload functions to become compatible with both old and new code that produces what were valid guards at that time. In particular, so that old contracts that bundle old endo preceding #1712 can be linked with new liveslots bundling an endo that includes #1712 .

This PR does not attempt to tolerate version slippage in the other direction. That would be hard because by assumption it would not be able to change the behavior of guard consuming code.

Upgrade Considerations

The problem of https://github.com/Agoric/agoric-sdk/issues/8826 manifested during an upgrade. This PR attempts to fix the problem that we think we actually need to fix to enable such upgrades. However, this PR and its predecessor #2039 only contain unit tests for this tolerance. We won't know if it fixes https://github.com/Agoric/agoric-sdk/issues/8826 until new relevant integration tests which are separately in progress (attn @mhofman ).

@mhofman has since done this integration testing and verified that this PR does fix https://github.com/Agoric/agoric-sdk/issues/8826. The test upgrade succeeded.

At such a time that we decide we no longer need to support code preceding https://github.com/endojs/endo/pull/1712 or guard data generated by that code, all the adaptation complexity in getGuardPayloads.js should be deleted. It can stay a separate file, but a much simpler one.

mhofman commented 7 months ago

I have not yet reviewed, but I have verified that applied as a patch to agoric-sdk, the upgrade case which re-used an old zcf bundle no longer fails.