OpenLiberty / open-liberty

Open Liberty is a highly composable, fast to start, dynamic application server runtime environment
https://openliberty.io
Eclipse Public License 2.0
1.14k stars 582 forks source link

Enable verbose garbage collection by default on IBM Java/Semeru #23607

Closed kgibm closed 4 months ago

kgibm commented 1 year ago

Description

By default, verbosegc is not enabled in Liberty (specifically, not enabled by default in Java). This is a problem if a performance or OutOfMemoryError issue occurs as the issue will often need to be reproduced with verbosegc, or users may simply overlook garbage collection performance issues (e.g. thread dumps may point to various application stacks but the underlying issue could be garbage collection). Verbosegc was enabled by default for new profiles in WAS traditional 9.0.0.3 and 9.0.0.4 (z/OS). This epic proposes to enable verbose garbage collection by default on IBM Java/Semeru. Initially discussed in design issue #23001.


Documents

When available, add links to required feature documents. Use "N/A" to mark particular documents which are not required by the feature.

General Instructions

The process steps occur roughly in the order as presented. Process steps occasionally overlap.

Each process step has a number of tasks which must be completed or must be marked as not applicable ("N/A").

Unless otherwise indicated, the tasks are the responsibility of the Feature Owner or a Delegate of the Feature Owner.

If you need assistance, reach out to the OpenLiberty/release-architect.

Important: Labels are used to trigger particular steps and must be added as indicated.


Prioritization (Complete Before Development Starts)

The (OpenLiberty/chief-architect) and area leads are responsible for prioritizing the features and determining which features are being actively worked on.

Prioritization

Design preliminaries determine whether a formal design, which will be provided by an Upcoming Feature Overview (UFO) document, must be created and reviewed. A formal design is required if the feature requires any of the following: UI, Serviceability, SVT, Performance testing, or non-trivial documentation/ID.

Design Preliminaries

Design

No Design

FAT Documentation

A feature must be prioritized before any implementation work may begin to be delivered (inaccessible/no-ship). However, a design focused approach should still be applied to features, and developers should think about the feature design prior to writing and delivering any code.
Besides being prioritized, a feature must also be socialized (or No Design Approved) before any beta code may be delivered. All new Liberty content must be inaccessible in our GA releases until it is Feature Complete by either marking it kind=noship or beta fencing it.
Code may not GA until this feature has obtained the "Design Approved" or "No Design Approved" label, along with all other tasks outlined in the GA section.

Feature Development Begins

Legal and Translation

In order to avoid last minute blockers and significant disruptions to the feature, the legal items need to be done as early in the feature process as possible, either in design or as early into the development as possible. Similarly, translation is to be done concurrently with development. Both MUST be completed before Beta or GA is requested.

Legal (Complete before Feature Complete Date)

Innovation (Complete 1 week before Feature Complete Date)

Translation (Complete by Feature Complete Date)

In order to facilitate early feedback from users, all new features and functionality should first be released as part of a beta release.

Beta Code

Beta Blog (Complete by beta eGA)

A feature is ready to GA after it is Feature Complete and has obtained all necessary Focal Point Approvals.

Feature Complete

Focal Point Approvals (Complete by Feature Complete Date)

These occur only after GA of this feature is requested (by adding a target:ga label). GA of this feature may not occur until all approvals are obtained.

All Features

Design Approved Features

Remove Beta Fencing (Complete by Feature Complete Date)

GA Blog (Complete by Friday after GM)

Post GM (Complete before GA)

Post GA

tjwatson commented 1 year ago

UFO review comments/questions:

tjwatson commented 1 year ago

Part 2 UFO Review

NottyCode commented 1 year ago

@kgibm can you add a comment to indicate how the socialization feedback was addressed?

kgibm commented 1 year ago

@NottyCode Sure. How each item was addressed on slides 38-42 of the UFO; copying in:

NottyCode commented 1 year ago

@kgibm I'm not seeing the following updates:

kgibm commented 1 year ago

@NottyCode

I'm not seeing the following updates:

* Run SOE tests with the option enabled to determine the log file size impact on the build logs gathered during the test runs.
  Added to System Test Impact slide

Sorry, that's on the Automated Testing slide 28 instead. I'll update the comment.

* Verbose GC settings must only apply to start and run. Further consideration is needed for the checkpoint action.
  Added to Feature Design slide

On slide 12: "Use SERVER_*_JAVA_OPTIONS so that it only applies to start and run actions, not all actions"

* How to use log files instead of jobs for verbose GC on Z
  Added to Communication slide

On slide 18: "and how to use HFS/ZFS instead with proper tokens if desired"

rsherget commented 5 months ago

@OpenLiberty/demo-approvers Demo scheduled for EOI 24.04

donbourne commented 5 months ago

Serviceability Approval Comment - Please answer the following questions for serviceability approval:

  1. UFO -- does the UFO identify the most likely problems customers will see and identify how the feature will enable them to diagnose and solve those problems without resorting to raising a PMR? Have these issues been addressed in the implementation?

  2. Test and Demo -- As part of the serviceability process we're asking feature teams to test and analyze common problem paths for serviceability and demo those problem paths to someone not involved in the development of the feature (eg. L2, test team, or another development team).
    a) What problem paths were tested and demonstrated? b) Who did you demo to? c) Do the people you demo'd to agree that the serviceability of the demonstrated problem scenarios is sufficient to avoid PMRs for any problems customers are likely to encounter, or that L2 should be able to quickly address those problems without need to engage L3?

  3. SVT -- SVT team is often the first team to try new features and often encounters problems setting up and using them. Note that we're not expecting SVT to do full serviceability testing -- just to sign-off on the serviceability of the problem paths they encountered. a) Who conducted SVT tests for this feature? b) Do they agree that the serviceability of the problems they encountered is sufficient to avoid PMRs, or that L2 should be able to quickly address those problems without need to engage L3?

  4. Which L2 / L3 queues will handle PMRs for this feature? Ensure they are present in the contact reference file and in the queue contact summary, and that the respective L2/L3 teams know they are supporting it. Ask Don Bourne if you need links or more info.

  5. Does this feature add any new metrics or emit any new JSON events? If yes, have you updated the JMX metrics reference list / Metrics reference list / JSON log events reference list in the Open Liberty docs?

rsherget commented 4 months ago

@OpenLiberty/serviceability-approvers

  1. UFO -- does the UFO identify the most likely problems customers will see and identify how the feature will enable them to diagnose and solve those problems without resorting to raising a PMR? Have these issues been addressed in the implementation?

Yes, the UFO identifies the most likely problems customers will see, as well as how to debug/solve them. The scenarios have also been tested with FAT testing.

  1. Test and Demo -- As part of the serviceability process we're asking feature teams to test and analyze common problem paths for serviceability and demo those problem paths to someone not involved in the development of the feature (eg. L2, test team, or another development team). a) What problem paths were tested and demonstrated?

    • Default Server creates verbosegc log
    • Adding VERBOSEGC=false to server.env turns off logging.
    • Adding custom verbosegc configuration takes precedence over the default.
    • Adding custom configuration while VERBOSEGC=false is in server.env still allows user configuration to work.
    • Adding VERBOSEGC=true still creates verbosegc log.
    • Non-IBM java versions don't create verbosegc log.

b) Who did you demo to? Jim Blye c) Do the people you demo'd to agree that the serviceability of the demonstrated problem scenarios is sufficient to avoid PMRs for any problems customers are likely to encounter, or that L2 should be able to quickly address those problems without need to engage L3? Yes

  1. SVT -- SVT team is often the first team to try new features and often encounters problems setting up and using them. Note that we're not expecting SVT to do full serviceability testing -- just to sign-off on the serviceability of the problem paths they encountered. a) Who conducted SVT tests for this feature? No explicit SVT was required but Brian Hanczaryk is the SVT Feature Focal Point b) Do they agree that the serviceability of the problems they encountered is sufficient to avoid PMRs, or that L2 should be able to quickly address those problems without need to engage L3? No explicit SVT was performed for this feature.

  2. Which L2 / L3 queues will handle PMRs for this feature? Ensure they are present in the contact reference file and in the queue contact summary, and that the respective L2/L3 teams know they are supporting it. Ask Don Bourne if you need links or more info.

WAS L2: ADM WAS L3: Kernel

  1. Does this feature add any new metrics or emit any new JSON events? If yes, have you updated the JMX metrics reference list / Metrics reference list / JSON log events reference list in the Open Liberty docs?

N/A

rsherget commented 4 months ago

@OpenLiberty/ste-approvers The STE Slidedeck has been uploaded to the STE Archive.

rsherget commented 4 months ago

@OpenLiberty/svt-approvers - There are no SVT requirements for this feature. Please let me know approval can be granted or if anything else is needed.

gnadell commented 4 months ago

WASSDK Support is good with the STE slides. Hence approving.

rsherget commented 4 months ago

@OpenLiberty/performance-approvers Can you please review the Performance approval for this feature? Please let me know if approval can be granted or if anything else is needed.

rsherget commented 4 months ago

@OpenLiberty/instanton-approvers Can you please review the InstantOn approval for this feature? Please let me know if approval can be granted or if anything else is needed.

chirp1 commented 4 months ago

The developer opened the following documentation issue: https://github.com/OpenLiberty/docs/issues/7240 The ID team has incorporated the updates. The developer has approved the updates. Approving.

LifeIsGood524 commented 4 months ago

This PR is merged and in a release build. Consulted with Eric and Harry....closing issue.... @rsherget @hlhoots

LifeIsGood524 commented 4 months ago

see above comment