Closed dazavala closed 1 year ago
Summary of openliberty-io user documentation discovery (nov11)
[ ] Basics -> Container images
[ ] Basics -> New section for InstantOn
depoyment
and applications
actions for checkpoint
and explain when to choose one over the other.[ ] Deployment -> Open Liberty Operator
[ ] Reference -> Server commands
checkpoint
actions will remain external. If so, add a section to document checkpoint
exit codes, but provide no usage information for the action (open question!)[ ] Open questions that impact user documentation
checkpoint
actions remain external,and what are the supported actions?currently targeting to align with 23.0.0.6 to GA
@tam512 please add details regarding testing with Docker 23.0.0 that would need to be doc'd per discussion on scrum call. Thanks. cc'ing @tjwatson
docker build
does not have --cap-add
flag so we can not build checkpoint image the convenient way of having RUN checkpoint.sh applications
in the Containerfile or Dockerfile as document in 23.0.0.2-beta blog. To build checkpoint image with docker, we need to use the 3 steps - build, checkpoint, commit as document in the first beta blog
There is more investigation needed to see about using docker buildx
to do more advanced things during a container build. One thing is to be able to specify the necessary capabilities to do a checkpoint during the container build in one step. I did look at doing this and it seemed the checkpoint would succeed, but the resulting image could not restore properly. This needs more investigation, but right now it is not a priority for the initial release.
I recommended that we document the 3 step process for building an InstantOn application image in general for both podman
and docker
builds. The 3 step process will work for both. Then we can have another section that describes the single step process with using checkpoint.sh
script in a Dockerfile
, but limit that documentation to indicate it is only supported with podman
builds.
Need to document the necessary sys call to do restore. --security-opt seccomp=unconfined
is the easy way to grant all sys calls, but explicitly granting only the required sys calls is better than opening up access to all sys calls.
@tjwatson can help with this.
Document trouble shooting for checkpoint failure on SELinux per issue 24522. @ymanton please help with this. Thanks!
Draft docs ready for technical review:
checkpoint
command isn't mentioned in the main documentation at all. What is the strategy for this command? How would anyone know how to integrate it with the information in the main topic? I notice Dave's comment about needing to decide whether or not it will be external. If the decision is yes, I think we need to provide more information about use case, how to, etc.
- The checkpoint command isn't mentioned in the main documentation at all. What is the strategy for this command? How would anyone know how to integrate it with the information in the main topic? I notice Dave's comment about needing to decide whether or not it will be external. If the decision is yes, I think we need to provide more information about use case, how to, etc.
For in container usage, the checkpoint
command is an implementation detail behind the checkpoint.sh
script that gets run when doing an InstantOn application container build. We could reference it from the section https://docs-draft-openlibertyio.mqj6zf7jocq.us-south.codeengine.appdomain.cloud/docs/latest/instanton.html#build Building an InstantOn application image
after the two bullets that describe the two options for building InstantOn. Something like this:
Both options use the xref...server-checkpoint.adoc[checkpoint] server command to perform the checkpoint during InstantOn application container image build.
- The documentation refers to both CRIU and InstantOn in a seemingly interchangeable manner in some sections. eg, "CRIU cannot perform a checkpoint..." or "InstantOn makes a checkpoint..." Does the user need to worry about CRIU beyond how it is explained as an enabling technology in the introduction? Could subsequent references to checkpoint/restore processes just say "InstantOn" ?
For the most part we can replace CRIU with InstantOn I think. There are some exceptions.
When to make a checkpoint: beforeAppStart or afterAppStart
should continue to use CRIU to introduce the technology.Linux capability prerequisites for checkpoint and restore
section should continue to use CRIU I think. In particular it would be wrong to say Because the InstantOn binary is granted more capabilities, like CHECKPOINT_RESTORE, it needs these capabilities so it can drop such capabilities from the final process that InstantOn restored.
Some rewording will be needed if you do this. For example:
Other public cloud Kubernetes services might also work if they have the prerequisites to allow InstantOn to restore the InstantOn application process.
Seems awkward to have two InstantOn
in that sentence. Perhaps something like this:
Other public cloud Kubernetes services might also work if they have the prerequisites to allow the InstantOn application process to restore.
- Should 23.0.0.6 be explicitly documented as a prerequisite minimum OL version in the prereq section? It is mentioned but not listed among other host prereqs. Same question with Semeru Java 11/17
The host prerequisites was intended to list the prereqs required to run our InstantOn enabled Open Liberty container images. That is the images described in the paragraph before the bulleted list. That paragraph indicates the InstantOn support images start at 23.0.0.6 which are based on the Semeru runtime. It also mentions Java 11/17 semeru versions there. It would seem redundant to me to indicate these again as host system requirements. Besides, I don't see this as host system pre-reqs because it is just what is in the images themselves, not what is on the actual host system running the images.
For in container usage, the checkpoint command is an implementation detail behind the checkpoint.sh script that gets run when doing an InstantOn application container build.
Is there any use case where someone would run it from the CLI? Or need the doc to customize the script somehow? I dont think we necessarily need to mention it on the main page, but the command documentation could be misleading if we don't make it clear that the command is only used by the script (if that's the case). OTOH if it's purely an implementation detail, should we be documenting it externally at all?
Is there any use case where someone would run it from the CLI? Or need the doc to customize the script somehow? I dont think we necessarily need to mention it on the main page, but the command documentation could be misleading if we don't make it clear that the command is only used by the script (if that's the case). OTOH if it's purely an implementation detail, should we be documenting it externally at all?
For this release we can consider it an implementation detail of the InstantOn container build for checkpoint. With that in mind I think we could just omit the command page for checkpoint
for now.
Thanks @tjwatson - I made the following updates per your repsonses:
Fast startup with InstantOn InstantOn system calls InstantOn limitations and known issues
Let me know if any further edits are needed. When you're satisfied with the drafts, you can add the technical reviewed
label to this issue and I'll send it for ID peer review to prepare for publishing with 23.0.0.6. Thanks
@tjwatson I have some questions about the Fast startup with InstantOn documentation
Currently, the only supported processor is X86-64/AMD64. Other processors are expected to be supported in later releases of Open Liberty InstantOn.
We tested checkpoint and restore on VM with the following Intel processor, but we only claim support on AMD64?
cat /proc/cpuinfo | grep -i model model : 85 model name : Intel Xeon Processor (Skylake, IBRS)
Currently, InstantOn is supported with the IBM Semeru Java version 11.0.9+ and IBM Semeru Java version 17.0.7+
When testing with Java11 image, I see Javaversion 11.0.19+7
as following, I just want to confirm that we support InstantOn on Java 11.0.9+
or 11.0.19+
Launching defaultServer (WebSphere Application Server 23.0.0.6/wlp-1.0.78.cl230620230608-1100) on Eclipse OpenJ9 VM, version 11.0.19+7 (en_US)
Do we need to mention that beforeAppStart or afterAppStart checkpoint location is not case sensitive?
When testing on AKS and EKS, I also have securityContext allowPrivilegeEscalation: true
but I do not see it listed in the doc, so do we need it?
Regarding (4), we need allowPrivilegeEscalation: true
when deploy checkpoint application images on AKS and EKS otherwise we will get error
CRIU needs to have the CAP_SYS_ADMIN or the CAP_CHECKPOINT_RESTORE capability:
setcap cap_checkpoint_restore+eip /opt/criu/criu
CWWKE0964E: Restoring the checkpoint server process failed. Check the /logs/checkpoint/restore.log log to determine why the checkpoint process was not restored. The server did not launch because checkpoint restore recovery is disabled.
We tested checkpoint and restore on VM with the following Intel processor, but we only claim support on AMD64?
The doc always refers to both X86-64 and AMD64 with a slash (e.g. X86-64/AMD64
). By and large, you can think of the two as aliases to each other. Technically speaking AMD provided the architectural design of AMD64 which was originally an extension of the x86 architecture. It then began being referred to as X86-64 also. Both Intel and AMD implement chipsets that follow the architectural design (see https://en.wikipedia.org/wiki/X86-64 for more context).
When testing with Java11 image, I see Java
version 11.0.19+7
as following, I just want to confirm that we support InstantOn on Java11.0.9+
or11.0.19+
You are correct, looks like 11.0.9+
was a typo and should be 11.0.19+
Do we need to mention that beforeAppStart or afterAppStart checkpoint location is not case sensitive?
We could, I don't personally think it is required to document that though.
When testing on AKS and EKS, I also have securityContext allowPrivilegeEscalation: true but I do not see it listed in the doc, so do we need it?
Good point, we should have that documented. For completeness, can you show us your complete securityContext
sections. Or better yet the delta of your deployment yaml you use for InstantOn vs a "normal" liberty application deployment.
This is the complete securityContext
when testing with InstantOn. Without InstantOn, we do not need to specify securityContext section
spec:
...........
securityContext:
allowPrivilegeEscalation: true
privileged: false
runAsNonRoot: true
capabilities:
add:
- CHECKPOINT_RESTORE
- SETPCAP
drop:
- ALL
@ramkumar-k-9286 - this issue is ready for peer review:
Fast startup with InstantOn InstantOn system calls InstantOn limitations and known issues
InstantOn is not intended to be used outside of a container image build. -> (Acrolinx Suggestion) Do not use InstantOn outside of a container image build.
This configuration ensures that the resources in the lower layers of the image do not change from the time the checkpoint is taken to the time the image is started with InstantOn. -> (Acrolinx - Accessibility) This configuration ensures that the resources in the underlying layers of the image do not change from the time the checkpoint is taken to the time the image is started with InstantOn.
Which of these options you choose depends on the kind of code your application must run. -> (acrolinx suggestion) Which of these options you choose depends on the code your application must run.
The following examples assume you are using Docker to build an application image that is named liberty-app. --> The following examples assume that you are using Docker to build an application image that is named liberty-app.
Jakarta EE and MicroProfile applications might contain application code that gets run as the application is started, such as the following examples: -> Should we be adding links for Jakarta EE and MicroProfile here? 2nd mention of both after the short desc.
Add periods for the following bulleted list. Similar list before and after have periods.
loadOnStartup
attribute@Startup
annotation@Observes @Initialized(ApplicationScoped.class)
annotations - Also are these 2 separate items? @Observes
and @Initialized(ApplicationScoped.class)
- because there seems to be a space after @Observes
In some cases, the application code that runs as the application starts might not be suited for performing an InstantOn checkpoint. -> (acrolinx suggestion) Sometimes the application code that runs as the application starts might not be suited for performing an InstantOn checkpoint.
Reading configuration that is expected to change when the application is deployed, for example configuration from MicroProfile Config. -> A reading configuration that is expected to change when the application is deployed. For example, configuration from MicroProfile Config.
This option might result in slower restore times because it must run more code before the application is ready to service incoming requests. -> This option might result in slower restore times because it must run more code before the application is ready to service any incoming requests.
For more information about limitations with early startup code annd possible workarounds, see InstantOn limitations and known issues. -> For more information about limitations with early startup code and possible workarounds, see InstantOn limitations and known issues.
Starting with Open Liberty version 23.0.0.6, all X86-64/AMD64 UBI Open Liberty container images include the prerequisites for InstantOn to checkpoint and restore Open Liberty application processes. -> Starting with Open Liberty version 23.0.0.6, all X86-64/AMD64 UBI Open Liberty container images include the prerequisites for InstantOn to checkpoint and restoring Open Liberty application processes.
Currently, InstantOn is supported with the IBM Semeru Java version 11.0.19+ and IBM Semeru Java version 17.0.7+. InstantOn is expected to support new versions of IBM Semeru Java as they are released. -> Currently, InstantOn is supported by IBM Semeru Java version 11.0.19+ and IBM Semeru Java version 17.0.7+. InstantOn is expected to support new versions of IBM Semeru Java as they are released.
CHECKPOINT_RESTORE - This capability was added in Linux 5.9 to separate out checkpoint/restore functions from the overloaded SYS_ADMIN capability. -> CHECKPOINT_RESTORE - This capability was added in Linux 5.9 to separate checkpoint/restore functions from the overloaded SYS_ADMIN capability.
The following examples assume you are using Docker to build an application image that is named liberty-app
.
->
The following examples assume that you are using Docker to build an application image that is named liberty-app
.
Starting a container with the liberty-app-instanton
container image shows a much faster startup time than the original liberty-app image.
->
Starting a container with the liberty-app-instanton
container image shows a faster startup time than the original liberty-app image.
If restoration of the InstantOn application process fails, Open Liberty launches the server without using the InstantOn checkpoint process. -> If restoration of the InstantOn application process fails, Open Liberty starts the server without using the InstantOn checkpoint process.
In such cases, the Open Liberty application starts as if no InstantOn checkpoint process layer exists, which takes significantly longer than a successfully restored InstantOn process. -> In such cases, the Open Liberty application starts as if no InstantOn checkpoint process layer exists, which takes longer than a successfully restored InstantOn process.
No comments.
For more information about InstantOn prerequisties, see Runtime and host build system prerequisites. --> For more information about InstantOn prerequisites, see Runtime and host build system prerequisites.
--
If this @Inject
annotation of the configuration is contained in a CDI bean that is created and used before the checkpoint is performed, the value of "theDefault"
is injected.
->
Should theDefault
be in " " ?
This configuration allows the values to be updated with environment variable values or other configuration mechanisms, as described in Configuring microservices running in Kubernetes. -> link not working
InstantOn supports only a subset of Open Liberty features, as described in Open Liberty InstantOn supported features. -> Link working - but not redirected to #supported-feature on the linked page.
When an InstantOn application container image is run the bootstrap.properties file is not read. -> When an InstantOn application container image is run, the bootstrap.properties file is not read.
For example, you might use environment variables or other configuration mechanisms, as described Configuring microservices running in Kubernetes. -> link not working
If Yama is configured with one of the following modes, InstantOn cannot checkpoint or restore the application process in running containers:
2 - admin-only attach
3 - no attach -> If Yama is configured with one of the following modes, InstantOn cannot checkpoint or restore the application process in running containers:
2
- admin-only attach
3
- no attach
For InstantOn checkpoint and restore to work, Yama must be configured with one of the following modes:
0 - classic ptrace permissions
1 - restricted ptrace -> For InstantOn checkpoint and restore to work, Yama must be configured with one of the following modes:
0
- classic ptrace permissions
1
- restricted ptrace
As described in Required Linux system calls, CRIU requires a number of Linux system calls to restore the application process. -> As described in Required Linux system calls, CRIU requires several Linux system calls to restore the application process.
Amazon Elastic Kubernetes Service (EKS) Azure Kubernetes Service (AKS) These links are provided multiple times in the same page - required?
Thanks for reviewing @ramkumar-k-9286 - all suggestions implemented except:
Add periods for the following bulleted list. Similar list before and after have periods.
A servlet that uses the loadOnStartup attribute An EJB that uses the @Startup annotation A CDI bean that uses @Observes @Initialized(ApplicationScoped.class) annotations - Also are these 2 separate items? @Observes and @Initialized(ApplicationScoped.class) - because there seems to be a space after @Observes
The previous list had a mix of sentences and fragments, which requires periods for the items. This list is only fragments, so no periods are needed. W/r/t the annotations- they are separate annotations, but used in conjunction in this context.
Reading configuration that is expected to change when the application is deployed, for example configuration from MicroProfile Config. -> A reading configuration that is expected to change when the application is deployed. For example, configuration from MicroProfile Config.
This is correct as is- all items in this list begin with verb phrases that describe app scenarios "Reading configuration that is expected to change..." is a verb phrase, not a compound noun.
Starting with Open Liberty version 23.0.0.6, all X86-64/AMD64 UBI Open Liberty container images include the prerequisites for InstantOn to checkpoint and restoring Open Liberty application processes.
present participle (restoring) doesn't agree with infinitive verb clause (to checkpoint...). "Checkpoint and restore.." is a compound verb phrase used throughout the doc, with precedent in the Linux source doc.
Currently, InstantOn is supported by IBM Semeru Java version 11.0.19+ and IBM Semeru Java version 17.0.7+. InstantOn is expected to support new versions of IBM Semeru Java as they are released.
"with" is more correct in this context, as the intention is to say that iOn is supported when used in conjunction with Java 11/17.
Configuring microservices running in Kubernetes -> this link goes to the guides, which aren't avaialble from docs draft site. I confirmed it works from the main draft site.
Let me know if any further changes are needed. Thanks
Fast startup with InstantOn InstantOn system calls InstantOn limitations and known issues
You can use these steps with either Podman and Docker to build an Instanton application image. -> You can use these steps with either Podman or Docker to build an InstantOn application image.
In addition to the features that are enabled in the convenience features, InstantON also supports the following features: -> In addition to the features that are enabled in the convenience features, InstantOn also supports the following features:
LINK:https://jakarta.ee/[Jakarta EE] and LINK:https://microprofile.io/[MicroProfile] applications might contain application code that gets run as the application is started, such as the following examples: -> doc to be fixed - link showing up.
Thanks for catching those @ramkumar-k-9286 - all fixed
content is on vNExt and will publish with 23.0.0.6
For epic https://github.com/OpenLiberty/open-liberty/issues/16384. Opened in accordance to point 6 of Documenting Open Liberty.
Create topics in openliberty.io for the new
checkpoint
server command and for InstantOn development.The discovery for documentation requirements was held Nov11, and openliberty.io requirements are summarized below. InstantOn also requires guide documentation that will publish in openliberty.io.
Documentation for InstantOn in-container usage may require updates to the DockerHub Open Liberty and IBM Container Repository documentation. Separate issues will be opened for DockerHub and ICR.