OpenLiberty / open-liberty

Open Liberty is a highly composable, fast to start, dynamic application server runtime environment
https://openliberty.io
Eclipse Public License 2.0
1.15k stars 592 forks source link

CRIU checkpoint restore when Persistent EJB Timers are used #18778

Open njr-11 opened 3 years ago

njr-11 commented 3 years ago

Persistent EJB Timers use the database to coordinate across multiple Liberty instances. There are two modes: 1) with missedTaskThreshold=-1 - where each Liberty server runs only its own timers, using a per-Liberty-server identifier that is read out of the database. Liberty server can be stopped and restarted and will pick up its same set of timers from before. I'm not sure how this would work with a checkpoint restore. You would want to keep running those timers from the database -- but only on one Liberty server instance. There will be locking issues if multiple servers think they have the same id. On the other hand, if no one picks up the timers, they will be lost. One option would be to simply say that the mode with missedTaskThreshold=-1 is not supported when checkpoint restore is used and force an error if someone attempts it. 2) with missedTaskThreshold>0, which means failover is enabled. In this case, timers should naturally get picked up by any new instance created from a restore. However, there could be some complications when multiple Liberty servers are created from the same checkpoint at the same time and all think they should go run the same timer at the same time. The code might be resilient enough to handle this, but it might be inefficient and prone to causing locking issues in the database.

@tkburroughs also mentioned a scenario with EJB timers in general where they try to catch up for missed executions all at once by running the timer over and over again.

tjwatson commented 3 years ago

I know very little about the topic of persistent EJB timers. But have the following thoughts after a discussion with @njr-11

  1. Initially we can add a prepare hook to the EJB feature that includes support for persistent timers. It can then register a simple checkpoint prepare hook. This prepare hook would detect if any existing persistent timers are active at prepare time cause the checkpoint operation to fail.
  2. If we can delay any creation of the persistent timer objects until after the com.ibm.ws.kernel.feature.ServerStarted service is registered then we could guarantee the creation of the timers happens after restore. This is because the last point we are going to allow a checkpoint to occur is just before this service is registered.
  3. Longer term we could look at what (if anything) could be done to prepare existing timers for a checkpoint and what it would take to fix them up to have proper behavior on the restore side.
tkburroughs commented 3 years ago

@tjwatson In addition to concerns about "persistent" times as they relate to database coordination with other servers, EJB timers in general (including "non-persistent" timers) have interesting semantics around scheduling.

Specifically, EJB timers are expected to schedule next timeout operations based on when an application creates the timer, rather than when the last timeout occurred.

For example, if an application creates a timer at exactly 3:00, which is scheduled to run every 5 minutes; then the next timeout is calculated from 3:00, and not just the last time it successfully ran. If at 3:30, the server is stopped, and then re-started an hour later at 4:30, then the next scheduled timeout is still 3:35 in the past.... so when the server re-starts, the timer will run 12 times immediately to catch up on all the missed expiration.

Therefore, when using CRIU, if an image of the server is captured today, which includes existing EJB timers (either persistent or non-persistent), then every time that image is used, the timers will run "catch-up" timeouts from the point the CRIU image was captured. The number of "catch-up" timeouts will increase over time. If the image is restarted an hour later, there will be 12 catch-up timeouts, after 2 hours, it would be 24 etc.

For "persistent" timers, the EJB Container supports the following configuration option:

missedPersistentTimerAction = ALL | ONCE

ALL is the default behavior described above. ONCE means that only 1 catchup is ever performed, and the the timer resumes scheduling from that point. ONCE is the default when failover is enabled (from @njr-11 's comments, missedTaskThreshold>0).

When CRIU is used, we could require people use ONCE, or we could add a hook that would enable ONCE briefly as the CRIU image is started.

Non-persistent timers do not currently support missedPersistentTimerAction since they would normally not survive a server restart..... however, I assume the CRIU image would contain them.... so we would want to add some hook such that we know a CRIU image is starting, and then enable ONCE like capabilites for non-persistent timers at that time.

njr-11 commented 3 years ago
2\. If we can delay any creation of the persistent timer objects until after the com.ibm.ws.kernel.feature.ServerStarted service is registered then we could guarantee the creation of the timers happens after restore.  This is because the last point we are going to allow a checkpoint to occur is just before this service is registered.

One complication is that the timers could have been created during a previous run of the server, so although the current server startup hasn't created any timers or performed any polling yet, persistent timers from previous runs will already be there, and in the case of missedTaskThreshold disabled (non-failover), it will have hard coded information in the database about the assignment of those tasks to particular instances of Liberty servers, which will not match a restore into a different location. I think we will likely need to require missedTaskThreshold > 0 (failover enabled) or at least detect and issue a warning if disabled.

tkburroughs commented 1 year ago

Based on what I know about InstantOn as it has evolved over time.... I think the following should occur for persistent timers: