dfinity / motoko

Simple high-level language for writing Internet Computer canisters
Apache License 2.0
506 stars 98 forks source link

Cancelling a recurring timer that I have lost track of #3857

Open ByronBecker opened 1 year ago

ByronBecker commented 1 year ago

Is there a good way to cancel a recurring timer for which I've lost track of the timerId? (Without upgrading the canister in question)

Just curious about if this is possible right now as I ran into this issue earlier today.

ggreif commented 1 year ago

Even if you knew what the Id is, how would you force the canister to cancelTimer <Id>?

ByronBecker commented 1 year ago

I have an API endpoint I call that looks something like this

public shared ({ caller }) func cancelTimer() : async Result.Result<(), Text> {
    // I keep track of the timer ID in a variable in the canister.
    switch (myTimerId) {
      case (?timerId) {
        cancelTimer(timerId);
        myTimerId := null;
        #ok;
      };
      case null #err("No timer to cancel");
    };
  };
ggreif commented 1 year ago

You could install an (almost) identical canister that also records the timer creations in an ordered data structure, and then makes that queryable. This should give you the Id. If this doesn't work (due to dynamic timer creation from internal data) then I lament, but you have to reinstall.

ByronBecker commented 1 year ago

Instead of reinstalling, my current solution is just to stop the canister and then restart it.

Would be nice to have a hook to wipe out all existing timers though (low priority).

ByronBecker commented 1 year ago

@ggreif I'm running into a weird issue with recurring timers now. Here are the steps I take to get into this spot:

  1. Create a function "A" which starts a recurring timer that executes every few (5-10) seconds (in my case the job is making an inter-canister call, but maybe this isn't important).
  2. Call function "A"
  3. Stop the canister that contains the timer.
  4. Wait for a 30-60 seconds.
  5. Restart the canister containing the timer. (timer should not be running).
  6. Call function "A" again (attempting to start up a new timer that executions the same function).

After step 6, it seems like multiple timer executions will kick off back to back, almost like the old timer starts running again until it has "caught up" with the current time, at which point (once "caught up") the rapid executions will stop, and the "newly created timer" will take over.

ggreif commented 1 year ago

Note: please disregard for now what is below, this needs some more in-depth look


Interesting findings, but I think it is easy to explain and needs some adjustments to the library.

  1. stopping the canister seems to reset the global timer (the global timer is not zeroed)
  2. no timer gets cancelled when the a canister is stopped
  3. to prevent jitter, recurring timers will add the delay to the desired expiration time (not the wall time of the expiration callback)

previous analysis (outdated)

This means that if a timer is present, the whole timer machinery doesn't restart after recommencing the canister. This is a bug. Coming out of the "stopped" state we have to set the global timer if needed. Also we should "cut out" all recurrent expirations that would have happened while being stopped, so that no n ✕ jobs get started without need.

ByronBecker commented 1 year ago

Are you describing a mixture of current behavior and desired behavior with these 3 points?

1. stopping the canister seems to reset the _global_ timer

2. no timer gets cancelled when the a canister is stopped

3. to prevent jitter, recurring timers will add the delay to the desired expiration time (_not_ the wall time of the expiration callback)

Another interesting finding I came across today.

Sometimes, (locally) stopping and restarting a canister once will NOT kill the timer.

However, stopping and restarting a canister twice WILL kill the timer.

ggreif commented 1 year ago

There is a replica test that verifies that stop-start of a canister doesn't change the global timer.

both are somewhat bizarre and I am trying to come up with a test case for both.

ggreif commented 1 year ago

@ByronBecker I heard that versions of dfx are out there that set up a local replica that has timer bugs. Try to upgrade to 0.13.1 if possible.

ggreif commented 1 year ago

@ByronBecker is a repro still possible? Do you think there is still a bug hiding somewhere?