LG-334: Add HTTP metrics to monitor-1.0, mpMetrics and OpenTelemetry

donbourne commented 2 years ago

Description

Open Liberty needs HTTP metrics to allow for:

easy computation of APDEX (https://en.wikipedia.org/wiki/Apdex)
RED metrics (rate, error, duration)
latency, traffic, errors from the four golden signals (https://sre.google/sre-book/monitoring-distributed-systems/#xref_monitoring_golden-signals)

For mpMetrics, metric naming should be based on rules from OpenTelemetry semantic conventions. Note that work on MP Metrics 5.0 is in progress and that spec could influence the naming of HTTP metrics. see https://opentelemetry.io/docs/reference/specification/metrics/semantic_conventions/http-metrics/#http-server

http.server.duration
http.server.active_requests

Metrics for HTTP MUST avoid cardinality explosion. To that end, the use of URL templates is required so that metrics are meaningful and number of distinct metric names is not unbounded.

see also what metrics Spring includes for HTTP: https://www.baeldung.com/micrometer

Documents

When available, add links to required feature documents. Use "N/A" to mark particular documents which are not required by the feature.

Aha: Externally raised RFE (Aha)
- Link the RFE with this issue
UFO: Link to Upcoming Feature Overview document
- https://ibm.box.com/s/toviqe26yjkh2xji2cukluy75xcqw2ya
FTS: Link to Feature Test Summary GH Issue https://github.com/OpenLiberty/open-liberty/issues/29385
Beta Blog: Link to Beta Blog Post GH Issue https://github.com/OpenLiberty/open-liberty/issues/29019
GA Blog: Link to GA Blog Post GH Issue: https://github.com/OpenLiberty/open-liberty/issues/29558 (shared blog with main Telemetry 2.0 feature)
Process Overview
Prioritization
Design
Implementation
Legal and Translation
Beta
GA
- Focal Point Approvals
Other Deliverables

General Instructions

The process steps occur roughly in the order as presented. Process steps occasionally overlap.

Each process step has a number of tasks which must be completed or must be marked as not applicable ("N/A").

Unless otherwise indicated, the tasks are the responsibility of the Feature Owner or a Delegate of the Feature Owner.

If you need assistance, reach out to the OpenLiberty/release-architect.

Important: Labels are used to trigger particular steps and must be added as indicated.

Prioritization (Complete Before Development Starts)

The (OpenLiberty/chief-architect) and area leads are responsible for prioritizing the features and determining which features are being actively worked on.

Prioritization

[x] Feature added to the "New" column of the Open Liberty project board
- Epics can be added to the board in one of two ways:
- From this issue, use the "Projects" section to select the appropriate project board.
- From the appropriate project board click "Add card" and select your Feature Epic issue
[x] Priority assigned
- Attend the Liberty Backlog Prioritization meeting
Design (Complete Before Development Starts)

Design preliminaries determine whether a formal design, which will be provided by an Upcoming Feature Overview (UFO) document, must be created and reviewed. A formal design is required if the feature requires any of the following: UI, Serviceability, SVT, Performance testing, or non-trivial documentation/ID.

Design Preliminaries

[x] UI requirements identified. (Owner and UI focal point)
[x] ID requirements identified. (Owner and ID focal point)
- Refer to Documenting Open Liberty.
- Feature Owner adds label ID Required, if non-trivial documentation needs to be created by the ID team.
- ID adds label ID Required - Trivial, if no design will be performed and only trivial ID updates are needed.
[x] Serviceability Requirements Identified. (Owner and Serviceability focal point)
[x] SVT Requirements Identified. (Owner and SVT focal point)
[x] Performance testing requirements identified. (Owner and Performance focal point)

Design

[x] POC Design / UFO review requested.
- Owner adds label Design Review Request
[x] POC Design / UFO review scheduled. (David Chang)
[x] POC Design / UFO review completed.
[x] POC / UFO Review follow-ons completed.
[ ] Design / UFO approved. (OpenLiberty/chief-architect) or N/A
- (OpenLiberty/chief-architect) adds label Design Approved
- Add the public link to the UFO in Box to the Documents section.
- The UFO must always accurately reflect the final implementation of the feature. Any changes must be first approved. Afterwards, update the UFO by creating a copy of the original approved slide(s) at the end of the deck and prepend "OLD" to the title(s). A single updated copy of the slide(s) should take the original's place, and have its title(s) prepended with "UPDATED".

No Design

[ ] No Design requested.
- Owner adds label No Design Approval Request
[ ] No Design / No UFO approved. (OpenLiberty/chief-architect) or N/A
- Approver adds label No Design Approved

FAT Documentation

[ ] "Feature Test Summary" child task created
- Use the Feature Test Summary Template
- Add FTS issue link to the Documents section.
Implementation

A feature must be prioritized and socialized (or No Design Approved) before any implementation work may begin and is the minimum before any beta code may be delivered. All new Liberty content must be inaccessible in our GA releases until it is Feature Complete by either marking it kind=noship or beta fencing it.
Code may not GA until this feature has obtained the "Design Approved" or "No Design Approved" label, along with all other tasks outlines in GA section.

Feature Development Begins

[ ] Add the In Progress label

Legal and Translation

In order to avoid last minute blockers and significant disruptions to the feature, the legal items need to be done as early in the feature process as possible, either in design or as early into the development as possible. Similarly, translation is to be done concurrently with development. Both MUST be completed before Beta or GA is requested.

Legal (Complete before Feature Complete Date)

[ ] Changed or new open source libraries are cleared and approved, or N/A. (Legal Release Services/Cass Tucker/Release PM).
[ ] Licenses and Certificates of Originality (COOs) are updated, or N/A

Translation (Complete 1 week before Feature Complete Date)

[ ] PII updates are merged, or N/A. Note timing with translation shipments.
Beta

In order to facilitate early feedback from users, all new features and functionality should first be released as part of a beta release.

Beta Code

[x] Beta fence the functionality
- kind=beta, ibm:beta, ProductInfo.getBetaEdition()
[x] Beta development complete and feature ready for inclusion in a beta release
- Add label target:beta and the appropriate target:YY00X-beta (where YY00X is the targeted beta version).
[x] Feature delivered into beta
- (OpenLiberty/release-manager) adds label release:YY00X-beta (where YY00X is the first beta version that included the functionality).

Beta Blog (Complete 1.5 weeks before beta eGA)

[x] Beta blog issue created and populated using the Open Liberty BETA blog post template.
- Add a link to the beta blog issue in the Documents section.
GA

A feature is ready to GA after it is Feature Complete and has obtained all necessary Focal Point Approvals.

Feature Complete

[ ] Feature implementation and tests completed.
- [ ] All PRs are merged.
- [ ] All epic and child issues are closed.
- [ ] All stop ship issues are completed.
[ ] Legal: all necessary approvals granted.
[ ] Translation: All messages translated or sent for translation for upcoming release
[ ] GA development complete and feature ready for inclusion in a GA release
- Add label target:ga and the appropriate target:YY00X (where YY00X is the targeted GA version).
- Inclusion in a release requires the completion of all Focal Point Approvals.

Focal Point Approvals (Complete by Feature Complete Date)

These occur only after GA of this feature is requested (by adding a target:ga label). GA of this feature may not occur until all approvals are obtained.

All Features

[ ] FAT All Tests complete and running successfully in SOE or N/A. (OpenLiberty/fat-approvers)
- Approver adds label focalApproved:fat.
[ ] Demo Demo is scheduled for an upcoming EOI or N/A. (OpenLiberty/demo-approvers)
- Approver adds label focalApproved:demo.
[ ] Globalization Translation and TVT are complete or N/A. (OpenLiberty/globalization-approvers)
- Approver adds label focalApproved:globalization.

Design Approved Features

[ ] Accessibility Accessibility testing completed or N/A. (OpenLiberty/accessibility-approvers)
- Approver adds label focalApproved:accessibility.
[ ] APIs/Externals Externals have been reviewed or N/A. (OpenLiberty/externals-approvers)
- Approver adds label focalApproved:externals
[ ] ID Documentation is complete or N/A. (OpenLiberty/id-approvers)
- Approver adds label focalApproved:id.
- NOTE: If only trivial documentation changes are required, you may reach out to the ID Feature Focal to request a ID Required - Trivial label. Unlike features with regular ID requirement, those with ID Required - Trivial label do not have a hard requirement for a Design/UFO.
[ ] Performance Performance testing is complete or N/A. (OpenLiberty/performance-approvers)
- Approver adds label focalApproved:performance.
[ ] Serviceability Serviceability has been addressed or N/A. (OpenLiberty/serviceability-approvers)
- Approver adds label focalApproved:sve.
[ ] STE Skills Transfer Education chart deck is complete or N/A. (OpenLiberty/ste-approvers)
- Approver adds label focalApproved:ste.
[ ] SVT System Verification Test is complete or N/A. (OpenLiberty/svt-approvers)
- Approver adds label focalApproved:svt.

Remove Beta Fencing (Complete by Feature Complete Date)

[ ] Beta guards are removed, or N/A
- Only after all necessary Focal Point Approvals have been granted.

GA Blog (Complete by Feature Complete Date)

[ ] GA Blog issue created and populated using the Open Liberty GA release blog post template.
- Add a link to the GA Blog issue in the Documents section.

Post GA

[ ] Replace target:YY00X label with the appropriate release:YY00X. (OpenLiberty/release-manager)
Other Deliverables
[ ] OL Guides OL Guides assessment is complete or N/A. (Yee-Kang Chang)
[ ] Standalone Feature Blog Post A blog post specifically about your feature or N/A. (OpenLiberty/release-architect)
- This should be strongly considered for larger or more prominent features.
- Follow instructions in the blogs repo.
[ ] WDT Liberty Developer Tools work is complete or N/A. (Leonard Theivendra)

scottkurz commented 5 months ago

2024-03-22 - Comments from UFO review meeting:

slide 3 - add required doc to Communication slide (Fixed)
slide 9 - Problem Statement - add something about wanting to support the Open Telemetry HTTP convention / spec (since it seemed like there were a lot of questions about the difference btw. this new stuff and the existing servlet/rest monitoring/metrics) (Fixed)
slide 15 - diagram mentions "JAX-RS" ...should be "Restful" (Fixed)
slide 28 - Maybe in the slide better differentiate the two new attribute substring value possibilities "HTTP", "HTTPServerRequest". ( It was a bit confusing especially when the part explaining that the "HTTP" value isn't going to do anything additionally today..maybe group/format the possibilities separately instead of in the one single bullet list on the page) (Fixed - original slide is now slide 29, new slide on 30 to address this -> Switched to just use "HTTP", if more Mbeans are introduced in the future, we can introduce more specific names to enable specific Mbeans then )
slide 26 - Use io.openliberty.* as pkg prefix for new API class HttpServerRequestMXBean (Fixed - original slide is now slide 27)
slide 29 - Talk to Gilbert about idea of updating guide(s) perhaps. (Discussed, guides are not updating)
performance concerns .. see chat history and discussion at end - maybe this neesd a design issue? (Performance already executed)

Related to 7. is this issue investigating performance impact of overlapping filters and looking at alt. design. (Design issue discussed here https://github.com/OpenLiberty/open-liberty/issues/28098)

donbourne commented 5 months ago

@Channyboy , UFO link is broken...perhaps forgot to set the link expiry?

NottyCode commented 1 month ago

@Channyboy I have a few questions before approving:

[x] Will this only capture http stats for Jakarta EE 9+ or will it also work for Java EE 7+?
[x] The UFO indicates that an existing problem is we only get metrics for some stats, not for all http, but later it talks about doing metrics for servlets which could read as if this will only work for servlets. Talking to @donbourne he suggests this will capture all http requests, but I don't think that is clear in the UFO
[x] Slide 21 has server_port="9079“}1.0 is that {1.0 right? It looks like a typo
[x] Slide 21 shows network_protocol_name and url_scheme will these ever have different values?

Channyboy commented 1 month ago

@NottyCode

Will this only capture http stats for Jakarta EE 9+ or will it also work for Java EE 7+?

Currently supports JEE10+, backport for EE9 and earlier was intended for later. (Updated page16)

The UFO indicates that an existing problem is we only get metrics for some stats, not for all http, but later it talks about doing metrics for servlets which could read as if this will only work for servlets. Talking to @donbourne he suggests this will capture all http requests, but I don't think that is clear in the UFO

The feature relies on the servlet engine and the use of the servlet filters to capture the necessary information. Any feature that uses it (i.e., for JEE10+ any feature that makes uses io.openliberty.servlet.internal-6.0 and io.openliberty.servlet.internal-6.1) like pages-3.1, restfulWS-3.1, servlet-6.0, xmlWS-4.0 if requests made are captured by the servlet filter then HTTP metrics will be reported for it. (Updated page16)

slide 21

Typo fixed. These will be the same values. network.protocol.name is conditionally required if the version is set and the name is not http. It will always be http, so we can remove this attribute.

donbourne commented 1 month ago

Currently supports JEE10+, backport for EE9 and earlier was intended for later. (Updated page16)

Discussed with @Channyboy that we need this capability back to EE7.

Channyboy commented 1 month ago

@NottyCode UFO updated on page 16 to mention support back to EE7 for mpTelemetry-2.0

dave-waddling commented 4 weeks ago

Sorry, I was a little keen adding the FAT Focal approval! FTS passed review but the testing still needs to go through a the mini-SOE early next week.

donbourne commented 3 weeks ago

OL:

Serviceability Approval Comment - Please answer the following questions for serviceability approval:

UFO -- does the UFO identify the most likely problems customers will see and identify how the feature will enable them to diagnose and solve those problems without resorting to raising a PMR? Have these issues been addressed in the implementation?
Test and Demo -- As part of the serviceability process we're asking feature teams to test and analyze common problem paths for serviceability and demo those problem paths to someone not involved in the development of the feature (eg. IBM Support, test team, or another development team).
a) What problem paths were tested and demonstrated? b) Who did you demo to? c) Do the people you demo'd to agree that the serviceability of the demonstrated problem scenarios is sufficient to avoid PMRs for any problems customers are likely to encounter, or that IBM Support should be able to quickly address those problems without need to engage SMEs?
SVT -- SVT team is often the first team to try new features and often encounters problems setting up and using them. Note that we're not expecting SVT to do full serviceability testing -- just to sign-off on the serviceability of the problem paths they encountered. a) Who conducted SVT tests for this feature? b) Do they agree that the serviceability of the problems they encountered is sufficient to avoid PMRs, or that IBM Support should be able to quickly address those problems without need to engage SMEs?
Which IBM Support / SME queues will handle PMRs for this feature? Ensure they are present in the contact reference file and in the queue contact summary, and that the respective IBM Support/SME teams know they are supporting it. Ask Don Bourne if you need links or more info.
Does this feature add any new metrics or emit any new JSON events? If yes, have you updated the JMX metrics reference list / Metrics reference list / JSON log events reference list in the Open Liberty docs?

chirp1 commented 3 weeks ago

Approving. David Mueller indicated that he has/will have the info that he needs to make the doc updates.

donbourne commented 3 weeks ago

@clarkek123 will be handling the serviceability approval for this epic.

dave-waddling commented 3 weeks ago

Thanks for completing the FTS. The results from the mini-SOE are good, so adding FAT Focal approval.

jdmcclur commented 3 weeks ago

Approving performance while noting some work may need to be done in the future: https://github.com/OpenLiberty/open-liberty/issues/29396.

Channyboy commented 3 weeks ago

@clarkek123 Serviceability Approval Comment - Please answer the following questions for serviceability approval:

UFO -- does the UFO identify the most likely problems customers will see and identify how the feature will enable them to diagnose and solve those problems without resorting to raising a PMR? Have these issues been addressed in the implementation?

There are no common error scenarios expected as listed in the UFO. However a customer may come across a performance problems or concerns since the underlying monitor-1.0 feature enables all runtime components to create stats/metrics. The customer can customize this by using the filter attribute of the monitor-1.0 configuration element (<monitor>) to explicitly enable the runtime components they want (i.e., only HTTP). This is configuration related to monitor-1.0 separate from this feature. This feature just relies on the appearance of the stats by monitor-1.0 and creating metrics with the provided information. (i.e., synchronizing/reading with available data).

Test and Demo -- As part of the serviceability process we're asking feature teams to test and analyze common problem paths for serviceability and demo those problem paths to someone not involved in the development of the feature (eg. IBM Support, test team, or another development team). a) What problem paths were tested and demonstrated?

As mentioned above, there are no common error scenarios. But will demo the use of the monitor-1.0 filter attribute and the switch between runtime and application configured MP Telemetry behavior (behavior that is part of by parent MP Telemetry 2.0 feature)

b) Who did you demo to?

Install team.

c) Do the people you demo'd to agree that the serviceability of the demonstrated problem scenarios is sufficient to avoid PMRs for any problems customers are likely to encounter, or that IBM Support should be able to quickly address those problems without need to engage SMEs?

Yes.

SVT -- SVT team is often the first team to try new features and often encounters problems setting up and using them. Note that we're not expecting SVT to do full serviceability testing -- just to sign-off on the serviceability of the problem paths they encountered. a) Who conducted SVT tests for this feature?

Dan Guinan

b) Do they agree that the serviceability of the problems they encountered is sufficient to avoid PMRs, or that IBM Support should be able to quickly address those problems without need to engage SMEs?

Yes

Which IBM Support / SME queues will handle PMRs for this feature? Ensure they are present in the contact reference file and in the queue contact summary, and that the respective IBM Support/SME teams know they are supporting it. Ask Don Bourne if you need links or more info.

WAS L3: Metrics

Does this feature add any new metrics or emit any new JSON events? If yes, have you updated the JMX metrics reference list / Metrics reference list / JSON log events reference list in the Open Liberty docs?

Yes

clarkek123 commented 3 weeks ago

@Channyboy I have reviewed the Serviceability information above and have additional questions. The UFO for HTTP Metrics Serviceability page 42 shows N/A, so perhaps there are no common error scenarios that require testing for this feature. Please confirm if that is the case.

The purpose of the Serviceability review is to ensure that common error scenarios have been tested and reviewed by a team other than the local team. For common error scenarios, the feature should provide good messages or responses so that customers can resolve the errors themselves without having to call IBM support. Common errors could be conflicts in configuration parameters, connection errors, feature configured with incompatible JakartaEE or Java level, and so on.

clarkek123 commented 3 weeks ago

I have added serviceability approval based on the information provided in the UFO and template along with the demo to the install team and serviceability review by the SVT team.

OpenLiberty / open-liberty