OpenLiberty / open-liberty

Open Liberty is a highly composable, fast to start, dynamic application server runtime environment
https://openliberty.io
Eclipse Public License 2.0
1.15k stars 591 forks source link

SpringBoot 3.0: InstantOn support #25594

Closed tjwatson closed 3 months ago

tjwatson commented 1 year ago

Description

SpringBoot 3 uses Spring Framework 6.x. Spring Framework version 6.1 (to be released November 2023) is enabling integration with CRaC. See https://docs.spring.io/spring-framework/reference/6.1-SNAPSHOT/integration/checkpoint-restore.html

This is also evident in the current milestone of 6.1 (6.1.0-M1) and can be seen in the spring source code at https://github.com/spring-projects/spring-framework/blob/654dee8cd6fd09314289e9bba92719d57001c539/spring-context/src/main/java/org/springframework/context/support/DefaultLifecycleProcessor.java#L485-L555

This will enable SpringBoot 3 applications to take advantage of checkpoint/restore technologies to rapidly startup the SpringBoot application. Liberty can implement the org.crac on top of the Liberty InstantOn support as a separate feature that provides the third-part org.crac APIs to applications (here the Spring libraries themselves).

This is important because Liberty InstantOn provides an ideal solution to running checkpoint/restore applications in the cloud. Spring's support for CRaC APIs will enable their very large community of developers to easily use Checkpoint/Restore technologies. I expect the Spring Framework will continue to improve their support for CRaC such that it will make it safe to checkpoint Spring Boot applications for production restores.

Liberty InstantOn should be able to provide a seamless experience to allow SpringBoot 3 applications to safely use InstantOn once the Spring Framework supports CRaC APIs when they are present.

Additional context

See https://github.com/sdeleuze/spring-boot-crac-demo for a working example of Spring Boot with CRaC. See https://aboullaite.me/what-the-crac/ See https://github.com/CRaC/org.crac


Documents

When available, add links to required feature documents. Use "N/A" to mark particular documents which are not required by the feature.

General Instructions

The process steps occur roughly in the order as presented. Process steps occasionally overlap.

Each process step has a number of tasks which must be completed or must be marked as not applicable ("N/A").

Unless otherwise indicated, the tasks are the responsibility of the Feature Owner or a Delegate of the Feature Owner.

If you need assistance, reach out to the OpenLiberty/release-architect.

Important: Labels are used to trigger particular steps and must be added as indicated.


Prioritization (Complete Before Development Starts)

The (OpenLiberty/chief-architect) and area leads are responsible for prioritizing the features and determining which features are being actively worked on.

Prioritization

Design preliminaries determine whether a formal design, which will be provided by an Upcoming Feature Overview (UFO) document, must be created and reviewed. A formal design is required if the feature requires any of the following: UI, Serviceability, SVT, Performance testing, or non-trivial documentation/ID.

Design Preliminaries

Design

No Design

FAT Documentation

A feature must be prioritized before any implementation work may begin to be delivered (inaccessible/no-ship). However, a design focused approach should still be applied to features, and developers should think about the feature design prior to writing and delivering any code.
Besides being prioritized, a feature must also be socialized (or No Design Approved) before any beta code may be delivered. All new Liberty content must be inaccessible in our GA releases until it is Feature Complete by either marking it kind=noship or beta fencing it.
Code may not GA until this feature has obtained the "Design Approved" or "No Design Approved" label, along with all other tasks outlined in the GA section.

Feature Development Begins

Legal and Translation

In order to avoid last minute blockers and significant disruptions to the feature, the legal items need to be done as early in the feature process as possible, either in design or as early into the development as possible. Similarly, translation is to be done concurrently with development. Both MUST be completed before Beta or GA is requested.

Legal (Complete before Feature Complete Date)

Translation (Complete 1 week before Feature Complete Date)

Innovation (Complete 1 week before Feature Complete Date)

In order to facilitate early feedback from users, all new features and functionality should first be released as part of a beta release.

Beta Code

Beta Blog (Complete 1.5 weeks before beta eGA)

A feature is ready to GA after it is Feature Complete and has obtained all necessary Focal Point Approvals.

Feature Complete

Focal Point Approvals (Complete by Feature Complete Date)

These occur only after GA of this feature is requested (by adding a target:ga label). GA of this feature may not occur until all approvals are obtained.

All Features

Design Approved Features

Remove Beta Fencing (Complete by Feature Complete Date)

GA Blog (Complete by Feature Complete Date)

Post GA

ayoho commented 1 year ago

Feedback from UFO meeting, part 1:

Slide 9

Slide 17

Slide 22

Slide 23

Slide 28

ayoho commented 1 year ago

Slide 29

Slide 36

Slide 38

tjwatson commented 1 year ago

@OpenLiberty/demo-approvers Demo scheduled for EOI [23.17]

malincoln commented 1 year ago

not sure what I did to close this so reopening

donbourne commented 5 months ago

OL:

Serviceability Approval Comment - Please answer the following questions for serviceability approval:

  1. UFO -- does the UFO identify the most likely problems customers will see and identify how the feature will enable them to diagnose and solve those problems without resorting to raising a PMR? Have these issues been addressed in the implementation? Yes, there are multiple slides in the updated UFO that identify information. Reviewed by the Open Liberty Kernel team.

  2. Test and Demo -- As part of the serviceability process we're asking feature teams to test and analyze common problem paths for serviceability and demo those problem paths to someone not involved in the development of the feature (eg. IBM Support, test team, or another development team).
    a) What problem paths were tested and demonstrated? All common error paths. b) Who did you demo to? Open Liberty Kernel Team c) Do the people you demo'd to agree that the serviceability of the demonstrated problem scenarios is sufficient to avoid PMRs for any problems customers are likely to encounter, or that IBM Support should be able to quickly address those problems without need to engage SMEs? Yes, the Open Liberty Kernel team believes the problem scenarios are sufficient to avoid PMRs.

  3. SVT -- SVT team is often the first team to try new features and often encounters problems setting up and using them. Note that we're not expecting SVT to do full serviceability testing -- just to sign-off on the serviceability of the problem paths they encountered. a) Who conducted SVT tests for this feature? b) Do they agree that the serviceability of the problems they encountered is sufficient to avoid PMRs, or that IBM Support should be able to quickly address those problems without need to engage SMEs?

  4. Which IBM Support / SME queues will handle PMRs for this feature? Ensure they are present in the contact reference file and in the queue contact summary, and that the respective IBM Support/SME teams know they are supporting it. Ask Don Bourne if you need links or more info.

  5. Does this feature add any new metrics or emit any new JSON events? If yes, have you updated the JMX metrics reference list / Metrics reference list / JSON log events reference list in the Open Liberty docs?

tam512 commented 4 months ago

SVT tested checkpoint/restore using Spring PetClinic sample app with Open Liberty daily beta image stg.icr.io/cp/olc/open-liberty-daily:beta _(Open Liberty 24.0.0.7-beta/wlp-1.0.90.cl240620240517-1201) on Eclipse OpenJ9 VM, version 21.0.2+13-LTS (enUS) Restore was done on Amazon EKS cluster

tjwatson commented 4 months ago

OL:

Serviceability Approval Comment - Please answer the following questions for serviceability approval:

  1. UFO -- does the UFO identify the most likely problems customers will see and identify how the feature will enable them to diagnose and solve those problems without resorting to raising a PMR? Have these issues been addressed in the implementation?

Yes, the UFO Serviceability section identifies the likely causes of failures a customer may see when using the feature.

  1. Test and Demo -- As part of the serviceability process we're asking feature teams to test and analyze common problem paths for serviceability and demo those problem paths to someone not involved in the development of the feature (eg. IBM Support, test team, or another development team). a) What problem paths were tested and demonstrated?

All failures identified int the UFO are demonstrated.

b) Who did you demo to?

To the kernel team

c) Do the people you demo'd to agree that the serviceability of the demonstrated problem scenarios is sufficient to avoid PMRs for any problems customers are likely to encounter, or that IBM Support should be able to quickly address those problems without need to engage SMEs?

Yes, the Open Liberty Kernel team believes the problem scenarios are sufficient to avoid PMRs.

  1. SVT -- SVT team is often the first team to try new features and often encounters problems setting up and using them. Note that we're not expecting SVT to do full serviceability testing -- just to sign-off on the serviceability of the problem paths they encountered.

    a) Who conducted SVT tests for this feature?

@tam512

b) Do they agree that the serviceability of the problems they encountered is sufficient to avoid PMRs, or that IBM Support should be able to quickly address those problems without need to engage SMEs?

In SVT, we look for serviceability issues such as error messages and they are clear and helpful.

TBD

  1. Which IBM Support / SME queues will handle PMRs for this feature? Ensure they are present in the contact reference file and in the queue contact summary, and that the respective IBM Support/SME teams know they are supporting it. Ask Don Bourne if you need links or more info.

The Equinox OSGi squad will provide support. This squad already provides support for InstantOn and also support for the springBoot feature (shared with kernel team).

  1. Does this feature add any new metrics or emit any new JSON events? If yes, have you updated the JMX metrics reference list / Metrics reference list / JSON log events reference list in the Open Liberty docs?

No

donbourne commented 4 months ago

@tam512 , can you please provide your comment for 3b on the serviceability approval:

b) Do they agree that the serviceability of the problems they encountered is sufficient to avoid PMRs, or that IBM Support should be able to quickly address those problems without need to engage SMEs?

hlhoots commented 4 months ago

For serviceability, the Open Liberty Kernel Team reviewed that content today, and added comments in @donbourne 's comment above.

chirp1 commented 4 months ago

Adding a new feature to InstantOn list, extended description for new feature CRAC 1.4. Ready, and will display the autogen. Doc issue #7331. Approving feature.

dazavala commented 4 months ago

Hello @tngiang73, @gnadell. Can we acquire STE approval on Monday (or sooner) with agreement that Tom will provide the STE deck and coordinate a training session? Tom should be available to consult regarding his progress on the training materials and plans to meet with Support developers. -Regards

dazavala commented 4 months ago

FYI @tjwatson: @tngiang73 will provide STE focal approval now under the agreement that we deliver the STE materials by next Monday, 10 June 2024. -Thanks all.

tjwatson commented 3 months ago

This is done