feat: on low wasm memory hook

ielashi commented 8 months ago

This proposal is based on this forum post and has already been approved by motion proposal 106146.

Canister developers have to actively monitor their canisters to know if they are low on wasm memory. If detected too late, a canister can be completely stuck and forever un-upgradable.

To address this, we introduce a hook called on_low_wasm_memory. When defined, it is triggered whenever the canister's memory usage exceeds some developer-defined threshold.

github-actions[bot] commented 8 months ago

🤖 Here's your preview: https://oov64-qqaaa-aaaak-qcnsa-cai.icp0.io/docs

ielashi commented 8 months ago

@mraszyk I didn't make any changes to the formal model in this PR yet, as I first wanted to get your feedback that the proposal as-is looks good.

dsarlis commented 6 months ago

@mraszyk I didn't make any changes to the formal model in this PR yet, as I first wanted to get your feedback that the proposal as-is looks good.

@ielashi @mraszyk I assume that we're fine with the current status of this PR so can the formal model be updated so we can move this to "FINAL" and be ready to start implementation at our convenience?

ielashi commented 6 months ago

@mraszyk Is the formal model something that you'd update yourself or should we look into that on our side?

mraszyk commented 6 months ago

I assume that we're fine with the current status of this PR

How about the call_on_cleanup analogy that I raised in the unresolved comment above?

ielashi commented 6 months ago

I assume that we're fine with the current status of this PR

How about the call_on_cleanup analogy that I raised in the unresolved comment above?

@mraszyk Doesn't this answer your question?

mraszyk commented 6 months ago

@ielashi I replied

The reason for that is that call_on_cleanup, unlike this hook, cannot make outbound calls.

As long as it can schedule a one-off timer, I don't see a major issue with this limitation.

on that thread and you put thumbs up so I thought you saw that.

ielashi commented 6 months ago

@ielashi I replied

The reason for that is that call_on_cleanup, unlike this hook, cannot make outbound calls.

As long as it can schedule a one-off timer, I don't see a major issue with this limitation.

on that thread and you put thumbs up so I thought you saw that.

I did, and my impression was the issue is resolved, or did I misunderstand?

mraszyk commented 6 months ago

We discussed this PR in yesterday's spec meeting together with @dsarlis and we concluded that it'd make more sense to guarantee that enough cycles are reserved for the hook and that the hook runs as the first message after the wasm memory limit threshold is crossed and then it's fine to treat it as a system task (rather than, e.g., call_on_cleanup).

dragoljub-duric commented 5 months ago

@mraszyk @ielashi @dsarlis Question 1: WDYT when this hook should be executed:

In every round canister is scheduled - this approach seems non-optimal because when we do not have anything to execute there is no risk of lack of memory
In every round canister executes a task or message - this approach is the safest since we will ensure that canister has enough memory before every execution, but we will have a bigger overhead when we execute hook with for example every heartbeat
In every round canister executes a message - this approach seems like reasonable tradeoff, but I am not sure that decreasing overhead from previous point is worth the risk

Question 2: If canister executes multiple messages (and/or tasks) in a single round should the hook be invoked before every execution?

Dfinity-Bjoern commented 5 months ago

As described above, we suggest keeping the old semantics (specifying a limit on allocated heap memory rather than "remaining" memory), in which case the scheduling is also much clearer (just execute once when an execution of the canister exceeds the limit, immediately after the execution that causes the hook to run).

dragoljub-duric commented 4 months ago

@Dfinity-Bjoern @mraszyk If we start keeping information about the hook as {"Condition is not satisfied", "Executed", "Ready to be executed"}, and based on that we use to determine if we will run hook, do you think we should persist that information in snapshots?

mraszyk commented 1 month ago

The implementation only considers stable memory and ignores, e.g., the chunk store and snapshot size, when deriving the wasm memory capacity.

mraszyk commented 1 month ago

The hook should not run after an upgrade/reinstall/uninstall/install if the condition is not satisfied after the upgrade (although the condition was satisfied before the upgrade and the hook did not execute yet).

mraszyk commented 2 weeks ago

Superseded by https://github.com/dfinity/portal/pull/3761

dfinity / interface-spec

feat: on low wasm memory hook #286