Closed mraszyk closed 1 year ago
Name | Link |
---|---|
Latest commit | 73a9c5cbe69b0bb0475c5490b6492c3dfb6a0cca |
Latest deploy log | https://app.netlify.com/sites/ic-interface-spec/deploys/64ad65616943360008a2f4ba |
Deploy Preview | https://deploy-preview-186--ic-interface-spec.netlify.app |
Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify site configuration.
Thanks @mraszyk. The change looks good to me by itself.
I have a high-level question: do the spec owners agree that what replica does currently is reasonable? Specifically: if a response callback is running when the canister is frozen, then any outgoing call will cause a trap in the response callback. The cleanup callback will still succeed because it cannot make outgoing calls. My personal opinion is that this behaviour is reasonable and is probably a good compromise as it protects against infinite self-call recursion.
it protects against infinite self-call recursion
this doesn't seem to be the case:
use ic_cdk::call;
#[ic_cdk::update]
async fn aux(i: u64) {
ic_cdk::print(i.to_string());
call::<_, ((),)>(ic_cdk::id(), "aux", ((i + 1),)).await.unwrap();
}
fails at recursion depth of 500:
2023-06-29 06:29:26.047100177 UTC: [Canister bnz7o-iuaaa-aaaaa-qaaaa-cai] 499
2023-06-29 06:29:26.047100177 UTC: [Canister bnz7o-iuaaa-aaaaa-qaaaa-cai] 500
2023-06-29 06:29:26.047100177 UTC: [Canister bnz7o-iuaaa-aaaaa-qaaaa-cai] Panicked at 'called `Result::unwrap()` on an `Err` value: (SysTransient, "Couldn't send message")', src/loop_backend/src/lib.rs:6:61
although the canister is not frozen so it's probably due to
pub const DEFAULT_QUEUE_CAPACITY: usize = 500;
Yeah, that's due to the queue limit. I remember seeing reports where the canister was stuck in making self-calls and couldn't be stopped without making it frozen first. For example: https://forum.dfinity.org/t/how-to-upgrade-a-canister-with-hanging-requests/20751?u=cryptoschindler
I guess it is something more involved than an infinite recursion.
I guess it is something more involved than an infinite recursion.
I could reproduce the issue from the forum post by deploying two canisters A and B with the following code:
use candid::Principal;
use ic_cdk::call;
#[ic_cdk::update]
async fn run(canister_id: Principal) {
loop {
call::<_, ((),)>(canister_id, "bar", ((),)).await.unwrap();
}
}
#[ic_cdk::update]
fn bar() {}
and calling run
on A
passing the canister ID of B
as argument. In that case, the loop continues running because a stopping canister can still make outgoing calls to other running canisters (e.g., canister B) and process callbacks (keeping the loop running).
On the other hand, calling run
on A
passing the canister ID of A
as argument (i.e., making A
perform self-calls in the loop) allows the canister A
to stop because such self-calls will be rejected once the canister is stopping.
if a response callback is running when the canister is frozen, then any outgoing call will cause a trap in the response callback
it does not cause a trap, but the call_perform
fails and the Rust CDK returns the failure as a reject response:
use candid::Principal;
use ic_cdk::call;
#[ic_cdk::update]
async fn run(canister_id: Principal) {
loop {
let res = call::<_, ((),)>(canister_id, "bar", ((),)).await;
ic_cdk::print(format!("{:?}", res));
}
}
#[ic_cdk::update]
fn bar() {}
runs in an infinite loop on a frozen canister producing log output of the form:
2023-07-01 08:05:58.931656143 UTC: [Canister bd3sg-teaaa-aaaaa-qaaba-cai] Err((SysTransient, "Couldn't send message"))
2023-07-01 08:05:58.931656143 UTC: [Canister bd3sg-teaaa-aaaaa-qaaba-cai] Err((SysTransient, "Couldn't send message"))
Thanks for checking @mraszyk!
runs in an infinite loop on a frozen canister producing log output of the form
It seems that a trap would produce a more user-friendly message, right?
The following two questions can be derived from the conversation on this PR:
Stopped
state) that can only be broken by increasing the freezing threshold of the canister (because outgoing calls from frozen canisters are always rejected)call_perform
system API returning a non-zero return value (current behavior) or should such a call make the execution trapHowever, they would require further exploration and thus we opt for merging this PR without following up on these questions for now.
The replica implementation does not allow the cycle balance to drop below the freezing limit due to transferring cycles in a new call.