Agoric / agoric-sdk

monorepo for the Agoric Javascript smart contract platform
Apache License 2.0
327 stars 208 forks source link

null-upgrade orch vats to get fixed liveslots #9978

Closed warner closed 2 months ago

warner commented 2 months ago

What is the Problem Being Solved?

I'm working on liveslots this week, to fix some GC bugs (#9939, #9956, #8756, #7355), where at least 9939 is critical to avoid vats making fatal syscalls and being killed by the kernel.

These bugs have been around for a year, but didn't really cause problems until we started using short-lived WeakMaps and weak Stores, because the vat-killing bug is only triggered when a weak collection is deleted. The @agoric/vows package uses short-lived weak stores and weakmaps extensively, and the new orchestration vats (deployed to mainnet in upgrade-16, 23-Jul-2024) use vows extensively. Nobody is using the orch vats yet, but if they did, there's a risk of the vats being killed, which would be a big mess.

The Task

So we need to fix those orch vats, by doing an upgrade which will let them start using the new liveslots, that includes the fixes for those bugs.

First, we need to get the PRs for those bugs landed (one is in review, I should submit the second today or tomorrow). We should make sure these changes make it into the "upgrade17" chain-halting upgrade, which will make the new version of liveslots available for all new vats, and all vat upgrades.

Then, we should prepare a core-eval which does a null-upgrade ("null" means same ZCF and contract code as before, but any upgrade will pull in the latest liveslots) of the orch vats:

Those null-upgrades should be deployed as either:

So we're either doing everything as part of upgrade17, or we're doing it in two steps (upgrade17, then core-eval).

dckc commented 2 months ago

Fixed in #9868 , I'm pretty sure:

https://github.com/Agoric/agoric-sdk/blob/35730335da5b0c468402aa15ebcf86a94d2f841a/packages/builders/scripts/vats/upgrade-orch-core.js#L10-L15

aj-agoric commented 2 months ago

Closing this issue pursuant to this conversation which determines further testing is not cost effective at this time.