CRIU checkpoint restore when Persistent EJB Timers are used

Persistent EJB Timers use the database to coordinate across multiple Liberty instances. There are two modes: 1) with missedTaskThreshold=-1 - where each Liberty server runs only its own timers, using a per-Liberty-server identifier that is read out of the database. Liberty server can be stopped and restarted and will pick up its same set of timers from before. I'm not sure how this would work with a checkpoint restore. You would want to keep running those timers from the database -- but only on one Liberty server instance. There will be locking issues if multiple servers think they have the same id. On the other hand, if no one picks up the timers, they will be lost. One option would be to simply say that the mode with missedTaskThreshold=-1 is not supported when checkpoint restore is used and force an error if someone attempts it. 2) with missedTaskThreshold>0, which means failover is enabled. In this case, timers should naturally get picked up by any new instance created from a restore. However, there could be some complications when multiple Liberty servers are created from the same checkpoint at the same time and all think they should go run the same timer at the same time. The code might be resilient enough to handle this, but it might be inefficient and prone to causing locking issues in the database.

@tkburroughs also mentioned a scenario with EJB timers in general where they try to catch up for missed executions all at once by running the timer over and over again.

I know very little about the topic of persistent EJB timers. But have the following thoughts after a discussion with @njr-11

Initially we can add a prepare hook to the EJB feature that includes support for persistent timers. It can then register a simple checkpoint prepare hook. This prepare hook would detect if any existing persistent timers are active at prepare time cause the checkpoint operation to fail.
If we can delay any creation of the persistent timer objects until after the com.ibm.ws.kernel.feature.ServerStarted service is registered then we could guarantee the creation of the timers happens after restore. This is because the last point we are going to allow a checkpoint to occur is just before this service is registered.
Longer term we could look at what (if anything) could be done to prepare existing timers for a checkpoint and what it would take to fix them up to have proper behavior on the restore side.

@tjwatson In addition to concerns about "persistent" times as they relate to database coordination with other servers, EJB timers in general (including "non-persistent" timers) have interesting semantics around scheduling.

Specifically, EJB timers are expected to schedule next timeout operations based on when an application creates the timer, rather than when the last timeout occurred.

For example, if an application creates a timer at exactly 3:00, which is scheduled to run every 5 minutes; then the next timeout is calculated from 3:00, and not just the last time it successfully ran. If at 3:30, the server is stopped, and then re-started an hour later at 4:30, then the next scheduled timeout is still 3:35 in the past.... so when the server re-starts, the timer will run 12 times immediately to catch up on all the missed expiration.

Therefore, when using CRIU, if an image of the server is captured today, which includes existing EJB timers (either persistent or non-persistent), then every time that image is used, the timers will run "catch-up" timeouts from the point the CRIU image was captured. The number of "catch-up" timeouts will increase over time. If the image is restarted an hour later, there will be 12 catch-up timeouts, after 2 hours, it would be 24 etc.

For "persistent" timers, the EJB Container supports the following configuration option:

missedPersistentTimerAction = ALL | ONCE

ALL is the default behavior described above. ONCE means that only 1 catchup is ever performed, and the the timer resumes scheduling from that point. ONCE is the default when failover is enabled (from @njr-11 's comments, missedTaskThreshold>0).

When CRIU is used, we could require people use ONCE, or we could add a hook that would enable ONCE briefly as the CRIU image is started.

Non-persistent timers do not currently support missedPersistentTimerAction since they would normally not survive a server restart..... however, I assume the CRIU image would contain them.... so we would want to add some hook such that we know a CRIU image is starting, and then enable ONCE like capabilites for non-persistent timers at that time.

2\. If we can delay any creation of the persistent timer objects until after the com.ibm.ws.kernel.feature.ServerStarted service is registered then we could guarantee the creation of the timers happens after restore.  This is because the last point we are going to allow a checkpoint to occur is just before this service is registered.

One complication is that the timers could have been created during a previous run of the server, so although the current server startup hasn't created any timers or performed any polling yet, persistent timers from previous runs will already be there, and in the case of missedTaskThreshold disabled (non-failover), it will have hard coded information in the database about the assignment of those tasks to particular instances of Liberty servers, which will not match a restore into a different location. I think we will likely need to require missedTaskThreshold > 0 (failover enabled) or at least detect and issue a warning if disabled.

Based on what I know about InstantOn as it has evolved over time.... I think the following should occur for persistent timers:

[] Defer creation of automatic persistent timers until restore. This will be a little tricky, as non-persistent timers are not done this way; for non-persistent, we create them, but don't enable running them until restore. Not an option for persistent, as they require a database and transactions to create. Also, currently errors creating them will cause the app to fail to start; so we should probably log a warning and document this difference in behavior.
[] Default to and Require missedTaskThreshold > 0 & missedTimerAction=ONCE. If either of these are configured otherwise, then an error should be logged and checkpoint failed. I do not see how it would ever make sense not to use these two settings. This prevents potentially thousands (or more) of "catch-up" timers running on restore. missedTimerAction already defaults differently if missedTaskThreshold>0, so I don't think it is odd to give it a different default for checkpoint.
[] PersistentExecutor should defer polling for existing timers until restore. EJB Container probably needs an update here, as I think the default behavior is to have the EJBContainer enable polling after it starts, but I think a customer could change that behavior through config, so PersistentExecutor likely also needs a change to not start polling until restore as well. Perhaps it would be best if PersistentExecutor could "ignore" the EJB Container, and just defer polling until restore, then EJB Container would not need a change.
[] Log a warning if automatic timer only runs in the past. The "run" time of EJB persistent timers is calculated from the "create time", and not the "last run completion time", so "create time" is important. However, for InstantOn, it seems unreasonable to expect "create time" for timers to be the time when the Checkpoint image is created; the time of "restore" seems to make more sense. The only issue here is that automatic timer creation could fail if done later (i.e. the timer is scheduled to run only in the past). In this case, I think a "warning" is appropriate... to let the admin know that the application tried to schedule a timer in the past, and thus it will not run. Could be the same or similar warning as in the first item above.

OpenLiberty / open-liberty

CRIU checkpoint restore when Persistent EJB Timers are used #18778