eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 721 forks source link

Checkpoint - investigate suspending threads before prepare and resuming after restore #13751

Closed tajila closed 2 years ago

tajila commented 3 years ago

Investigate if it is possible to suspend running Liberty threads before doing a prepare operation for checkpoint and then resume the threads after running the restore hooks when restoring the process.

This is to protect threads from resuming "too early" before any restore hooks have successfully restored their state to acceptable levels.

    /**
     * Sets the prepare hook which is called after pausing all application threads and before the process checkpoint
     * is done.
     * <p>Default: null
     *
     * @param prepare a function run after the JVM has paused all application threads and before the JVM checkpoint is performed
     * @return this
     */
    public CRIUSupport setPrepare(Callable<Boolean> prepare) {
                ...
    }

    /**
     * Sets the restore hook which is called before resuming all suspended threads.
     * <p>Default: null
     *
     * @param restore a function run after the JVM has restored but before resuming all suspended threads
     * @return this
     */
    public CRIUSupport setRestore(Callable<Boolean> restore) {
        ...
    }

On checkpoint:

Call checkpoint API
JVM suspends all other Threads
-------- enter single threaded phase ----------
Run application hooks
Run JVM hooks
Checkpoint JVM

On Restore:

Restore JVM
Run JVM hooks
Run application hooks
-------- exit single threaded phase ----------
Resume all other threads
Return from checkpoint API

We also need to handle potential deadlock cases when checkpointing thread has a dependency on a suspended thread.

There's some options here:

related to: https://github.com/OpenLiberty/open-liberty/issues/19040

keithc-ca commented 3 years ago

OMR functions omrintrospect_threads_* may be helpful. I believe they're currently only used when produce javacore files, but should be reusable for this.