Closed roman-kashitsyn closed 1 year ago
Name | Link |
---|---|
Latest commit | 211c3f2de00bfb5e0e7570d1429a51b4bc8c8189 |
Latest deploy log | https://app.netlify.com/sites/ic-interface-spec/deploys/64c2487d23b06e0008c7fe09 |
Deploy Preview | https://deploy-preview-135--ic-interface-spec.netlify.app |
Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify site configuration.
This is currently still blocked by the Candid bug that prevents backward-compatible decoding of extended variants.
Testing the
pre_upgrade
hookTesting
pre_upgrade
hooks is a constant source of struggle. If this method contains a bug, the canister can become unupgradable.The upgrade method with
skip_pre_upgrade
provides a method to excercise thepre_upgrade
method of the new canister module without the fear of data loss using the following procedure:
- Stop the canister.
- Upgrade the canister to the new module.
- If upgrade succeeded, upgrade the canister again to the same module. This upgrade will excercise the
pre_upgrade
hook.- If the second upgrade succeeds, proceed with the next step. Otherwise, install the old canister module in upgrade mode with
skip_pre_upgrade
.- Start the canister.
@roman-kashitsyn that might work, but isn't it simpler to do a trial upgrade to a canister that deliberately traps in post_upgrade? Then you go inspect response to see if you got the deliberate trap, or some other trap, indicating that something went wrong.That doesn't address the issue of how to fix a broken pre_upgrade (which this proposal might) but does let you test it non-destructively.
Cut'n'pasting a recipe I wrote up some time ago:
A developer might want to test an upgrade will complete successfully without losing state before performing the actual upgrade. One way to do this is to arrange for the canister to do a dry-run of the upgrade, that is guaranteed to fail with a trap in post_upgrade. After verifying the dry-run could complete successfully, the developer can perform the proper upgrade, this time without trapping, to commit to the change.
One way to code such a dry-run is to ensure that the new canister traps, either explicitly as the last step of canister/actor initialization or in a dedicated postupgrade Motoko system function. For added assurance, the canister could verify the restored state before trapping. By employing library function Debug.trap(msg), the developer can return a diagnostic error message, msg, along with the trap, to, for example, report the condition of the state post upgrade, as visible from the end of actor initialization or, optionally, within system function postupgrade.
Hereβs a simple example of Counter class that can verify its own upgrade before committing to it:
https://m7sm4-2iaaa-aaaab-qabra-cai.ic0.app/?tag=2623570777
import Debug "mo:base/Debug";
actor class Counter(upgradeMode: {#verify; #commit}) {
stable var version = 0;
stable var state : Int = 0;
func validate() : Bool {
state >= 0;
};
public func inc() : async Int {
state += 1;
return state;
};
public query func read() : async {version : Nat; state: Int} {
{version; state};
};
system func postupgrade() {
version += 1;
if (upgradeMode == #verify) {
Debug.trap(debug_show
#verify {version; state; valid = validate()});
};
}
}
I think this is an improvement over the status quo so we should go through with this (if we were doing it from the scratch I'd advocate removing this whole state management business from the System API and pushing into a "user space" library, but that ship's long sailed). I don't have a strong opinion on evict vs. upgrade with flags. From a UX perspective I dislike skip_pre_upgrade
being an opt bool
instead of a bool
given that the arguments are already an opt
, but from what I understand from this choice decreases the chances of something going wrong with the NNS, so fine, let's keep it.
Before merging this we should reflect the changes in the abstract state transition part (unless we've decided to drop that part from the spec).
Before merging this we should reflect the changes in the abstract state transition part (unless we've decided to drop that part from the spec).
Good point! I've updated the formal part of the spec to not block this MR on the decision whether we keep it or not.
Note from the interface spec meeting: Do we have an actual reason for the opt bool
? @roman-kashitsyn wrote "just in case" but we actually weren't sure whether or not that was a joke.
Note from the interface spec meeting: Do we have an actual reason for the
opt bool
? @roman-kashitsyn wrote "just in case" but we actually weren't sure whether or not that was a joke.
Just in case you want to deprecate this field later.
Note from the interface spec meeting: Do we have an actual reason for the
opt bool
? @roman-kashitsyn wrote "just in case" but we actually weren't sure whether or not that was a joke.Just in case you want to deprecate this field later.
I could be wrong, but I think because it occurs in negative position, as an argument to install_code
, you could just promote to reserved
or remove the field from the record anyway. But I always get this wrong.
Note from the interface spec meeting: Do we have an actual reason for the
opt bool
? @roman-kashitsyn wrote "just in case" but we actually weren't sure whether or not that was a joke.Just in case you want to deprecate this field later.
I could be wrong, but I think because it occurs in negative position, as an argument to
install_code
, you could just promote toreserved
or remove the field from the record anyway. But I always get this wrong.
I'm fine either way. The more I use Candid, the more I prefer using optional fields everywhere...
Whatever we do about the argument, does type change_details
in ic.dic needs to evolve too, or is the omission of the upgrade
payload deliberate? I imagine change_details
occurs in a positive position, which might force the choice of refining the upgrade
payload versus adding a new variant (in favour of refining the upgrade payload.).
Whatever we do about the argument, does type
change_details
in ic.dic needs to evolve too, or is the omission of theupgrade
payload deliberate? I imaginechange_details
occurs in a positive position, which might force the choice of refining theupgrade
payload versus adding a new variant (in favour of refining the upgrade payload.).
Good point! We can either leave change_details
as they're defined now (meaning we don't expose the fact whether pre_upgrade
was run, effectively treating it as an implementation detail) or extend the type of the upgrade
variant in change_details
to opt record {...}
which is fine because null
(the current implicit type of the upgrade
variant in change_details
) is a subtype of opt record {...}
.
In interface spec meeting: moving from bool
to reserved
would also work, so it seems using bool
instead of opt bool
seems preferable. We could alternatively use opt null
, which may be more natural (then it would be null
or opt null
).
Name | Link |
---|---|
Latest commit | 7ae270e1c157dce08795e1dfb10de0f226d51f36 |
Latest deploy log | https://app.netlify.com/sites/ic-interface-spec/deploys/64dcdefb765eea0008af839b |
Deploy Preview | https://deploy-preview-135--ic-interface-spec.netlify.app |
Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify site configuration.
I could be wrong, but I think because it occurs in negative position, as an argument to install_code, you could just promote to reserved or remove the field from the record anyway. But I always get this wrong.
What we missed though is that we need both forwards- and backwards-compatibility when deprecating those fields and thus we need an opt ...
.
This PR describes a new variant of the upgrade mode for
install_code
that allows canister controllers to skip thepre_upgrade
hook during the upgrade.This upgrade method can serve two purposes.
Act as an escape hatch if the canister entered otherwise irrecoverable state due to a critical bug in the
post_upgrade
method of the new canister module. The stable memory must already contain the entire state for the "fix" to work.Canister developers can use it during canister deployment to excercise the
pre_upgrade
method of the new canister.Recovering from buggy
post_upgrade
hooksImagine that the
post_upgrade
method implementation has a bug that bricks the canister. Upgrading the canister in such a state is either impossible (pre_upgrade
method traps unconditionally) or can result in a data loss (thepost_upgrade
method erroneously discarded important data, and executingpre_upgrade
would corrupt the original). The stable memory, however, might still contain the correct state that the old module recorded.If developers ever find themselves in such situation (I did), the
upgrade
install method withskip_pre_upgrade
is the only salvation.Testing the
pre_upgrade
hookTesting
pre_upgrade
hooks is a constant source of struggle. If this method contains a bug, the canister can become unupgradable.The upgrade method with
skip_pre_upgrade
provides a method to excercise thepre_upgrade
method of the new canister module without the fear of data loss using the following procedure:pre_upgrade
hook.skip_pre_upgrade
.