dfinity / interface-spec

IC Interface Specification
https://khsfq-wqaaa-aaaak-qckvq-cai.icp0.io/docs
37 stars 20 forks source link

refine system API conditions on transferred cycles to calls wrt freezing limit #186

Closed mraszyk closed 1 year ago

mraszyk commented 1 year ago

The replica implementation does not allow the cycle balance to drop below the freezing limit due to transferring cycles in a new call.

netlify[bot] commented 1 year ago

Deploy Preview for ic-interface-spec ready!

Name Link
Latest commit 73a9c5cbe69b0bb0475c5490b6492c3dfb6a0cca
Latest deploy log https://app.netlify.com/sites/ic-interface-spec/deploys/64ad65616943360008a2f4ba
Deploy Preview https://deploy-preview-186--ic-interface-spec.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

ulan commented 1 year ago

Thanks @mraszyk. The change looks good to me by itself.

I have a high-level question: do the spec owners agree that what replica does currently is reasonable? Specifically: if a response callback is running when the canister is frozen, then any outgoing call will cause a trap in the response callback. The cleanup callback will still succeed because it cannot make outgoing calls. My personal opinion is that this behaviour is reasonable and is probably a good compromise as it protects against infinite self-call recursion.

mraszyk commented 1 year ago

it protects against infinite self-call recursion

this doesn't seem to be the case:

use ic_cdk::call;

#[ic_cdk::update]
async fn aux(i: u64) {
    ic_cdk::print(i.to_string());
    call::<_, ((),)>(ic_cdk::id(), "aux", ((i + 1),)).await.unwrap();
}

fails at recursion depth of 500:

2023-06-29 06:29:26.047100177 UTC: [Canister bnz7o-iuaaa-aaaaa-qaaaa-cai] 499
2023-06-29 06:29:26.047100177 UTC: [Canister bnz7o-iuaaa-aaaaa-qaaaa-cai] 500
2023-06-29 06:29:26.047100177 UTC: [Canister bnz7o-iuaaa-aaaaa-qaaaa-cai] Panicked at 'called `Result::unwrap()` on an `Err` value: (SysTransient, "Couldn't send message")', src/loop_backend/src/lib.rs:6:61

although the canister is not frozen so it's probably due to

pub const DEFAULT_QUEUE_CAPACITY: usize = 500;
ulan commented 1 year ago

Yeah, that's due to the queue limit. I remember seeing reports where the canister was stuck in making self-calls and couldn't be stopped without making it frozen first. For example: https://forum.dfinity.org/t/how-to-upgrade-a-canister-with-hanging-requests/20751?u=cryptoschindler

I guess it is something more involved than an infinite recursion.

mraszyk commented 1 year ago

I guess it is something more involved than an infinite recursion.

I could reproduce the issue from the forum post by deploying two canisters A and B with the following code:

use candid::Principal;
use ic_cdk::call;

#[ic_cdk::update]
async fn run(canister_id: Principal) {
  loop {
    call::<_, ((),)>(canister_id, "bar", ((),)).await.unwrap();
  }
}

#[ic_cdk::update]
fn bar() {}

and calling run on A passing the canister ID of B as argument. In that case, the loop continues running because a stopping canister can still make outgoing calls to other running canisters (e.g., canister B) and process callbacks (keeping the loop running).

On the other hand, calling run on A passing the canister ID of A as argument (i.e., making A perform self-calls in the loop) allows the canister A to stop because such self-calls will be rejected once the canister is stopping.

mraszyk commented 1 year ago

if a response callback is running when the canister is frozen, then any outgoing call will cause a trap in the response callback

it does not cause a trap, but the call_perform fails and the Rust CDK returns the failure as a reject response:

use candid::Principal;
use ic_cdk::call;

#[ic_cdk::update]
async fn run(canister_id: Principal) {
  loop {
    let res = call::<_, ((),)>(canister_id, "bar", ((),)).await;
    ic_cdk::print(format!("{:?}", res));
  }
}

#[ic_cdk::update]
fn bar() {}

runs in an infinite loop on a frozen canister producing log output of the form:

2023-07-01 08:05:58.931656143 UTC: [Canister bd3sg-teaaa-aaaaa-qaaba-cai] Err((SysTransient, "Couldn't send message"))
2023-07-01 08:05:58.931656143 UTC: [Canister bd3sg-teaaa-aaaaa-qaaba-cai] Err((SysTransient, "Couldn't send message"))
ulan commented 1 year ago

Thanks for checking @mraszyk!

runs in an infinite loop on a frozen canister producing log output of the form

It seems that a trap would produce a more user-friendly message, right?

mraszyk commented 1 year ago

The following two questions can be derived from the conversation on this PR:

However, they would require further exploration and thus we opt for merging this PR without following up on these questions for now.