Open donbourne opened 6 months ago
Subject to change
High Level User Story: As an Operations engineer, I want to be able to export logs from Open Liberty to an Open Telemetry Exporter.
OpenTelemetry defines a Logs Bridge API for emitting LogRecords. OpenTelemetry provides a Logs Bridge API and SDK, which can be used together with existing logging libraries to automatically inject the trace context in the emitted logs, and provide an easy way to send the logs via OTLP. Instead of modifying each logging statement, log appenders use the API to bridge logs from existing logging libraries to the OpenTelemetry data model, where the SDK controls how the logs are processed and exported. The typical log SDK configuration installs a log record processor and exporter.
The LogRecordProcessor from the Logs SDK allows us to process and decorate the LogRecord fields to map to OTel Log Data Model.
SimpleLogRecordProcessor: This is an implementation of LogRecordProcessor which passes finished logs and passes the export-friendly ReadableLogRecord representation to the configured LogRecordExporter, as soon as they are finished.
BatchLogRecordProcessor: This is an implementation of the LogRecordProcessor which create batches of LogRecords and passes the export-friendly ReadableLogRecord representations to the configured LogRecordExporter.
BatchLogRecordProcessor and SimpleLogRecordProcessor are paired with LogRecordExporter, which is responsible for sending telemetry data to a particular backend.
In Open Liberty, Open Telemetry is initialized using the SDK autoconfiguration extension, instead of manually creating the OpenTelemetry instance by using the SDK builders directly in the code. This approach allows you to autoconfigure the OpenTelemetry SDK based on a standard set of supported environment variables and system properties. Hence, the logging providers can be configured using environment variables. Ref: https://opentelemetry.io/docs/languages/java/instrumentation/#autoconfiguration
Since, the mpTelemetry-2.0 OpenTelemetry instance is dependent on application thread context, it will be difficult to get the instance, when we are not apart of the application thread context, such as during server start up. Hence, we need to explicitly create a new server-level Open Telemetry instance, which would have its own server-specific OTel configuration. This will also work for the multi-app scenarios as well.
mpTelemetry-2.0
feature and configure the Environment variables to enable the server-level OTel SDK and configure the OTel Log exporter and LogRecord Processors, as follows:
mp.server.otel.service.name (can be similar as app-level config)
mp.server.otel.sdk.disabled=false
mp.server.otel.logs.exporter=otlp
mp.server.otel.exporter.otlp.endpoint=http://localhost:4317/
There will be server-level resource attributes configured for the server-level OpenTelemetry instance, as well. (https://opentelemetry.io/docs/specs/otel/configuration/sdk-environment-variables/#exporter-selection)
By default, the SimpleLogRecordProcessor will be enabled, where the records will be send immediately. However, if you want to send the records in batches, you can also configure the following logging specific Batch LogRecord Processor Environment variables to configure how often and how to export the logs over, and as well as log record limits for attributes. (https://opentelemetry.io/docs/specs/otel/configuration/sdk-environment-variables/#batch-logrecord-processor)
OTEL_BLRP_SCHEDULE_DELAY : Delay interval (in milliseconds) between two consecutive exports (Default = 1000)
OTEL_BLRP_EXPORT_TIMEOUT : Maximum allowed time (in milliseconds) to export data (Default = 30000)
OTEL_BLRP_MAX_QUEUE_SIZE : Maximum queue size (Default = 2048)
OTEL_BLRP_MAX_EXPORT_BATCH_SIZE : Maximum batch size (Default = 512)
OTEL_LOGRECORD_ATTRIBUTE_VALUE_LENGTH_LIMIT : Maximum allowed attribute value size (Default = no limit)
OTEL_LOGRECORD_ATTRIBUTE_COUNT_LIMIT : Maximum allowed log record attribute count (Default = 128)
logSources
attribute would be message
.<feature>mpTelemetry-2.0</feature>
…
<mpTelemetry logSources=“message, trace, accessLog, ffdc, audit”/>
The feature design would be similar to the Logstash Collector 1.0 feature, where we will be subscribing a “OTel Handler” to the collector manager, where we would get access to each event and format the event accordingly (JSONify), to map to the OTel Log Data Model and then emit it to the “Target”, which would be the configured exporter for OTel logs.
The corresponding SpanID and TraceID for trace context and application will be retrieved from the LogRecordContext (ext_traceID and ext_spanID).
Mapping Open Liberty Log Record to Open Telemetry Logs Data Model (https://opentelemetry.io/docs/specs/otel/logs/data-model/#log-and-event-record-definition) Note: When formatting the event, JSONify the event, so the event is structured properly.
Open Liberty Log Record | Open Telemetry Logs Data Model
=============================================
ibm_datetime = Timestamp
ext_traceId = TraceId
ext_spanId = SpanId
loglevel = SeverityText
(Refer to table below) = SeverityNumber
message = Body *** Should be the entire JSON payload instead?
host = Resource[“host.name”]
service.name = Resource[“service.name”]
io.openliberty.microprofile.telemetry = InstrumentationScope
Map the rest of the fields as Key:Value pairs = Attributes[“Key”]
Throwable example snippet:
throwable.getClass().getName() = Attributes[SemanticAttributes.EXCEPTION_TYPE]
throwable.getMessage = Attributes[SemanticAttributes.EXCEPTION_MESSAGE]
throwable.printStackTrace() = Attributes[SemanticAttributes.EXCEPTION_STACKTRACE]
Log Level Mapping to Open Telemetry Severity Text (https://opentelemetry.io/docs/specs/otel/logs/data-model/#severity-fields) (https://opentelemetry.io/docs/specs/otel/logs/data-model-appendix/#appendix-b-severitynumber-example-mappings)
Open Liberty Log Level | Open Telemetry Logs Severity Text / Number
====================================================
FATAL = FATAL / 21
SEVERE = ERROR / 17
WARNING = WARN / 13
AUDIT = INFO2 / 10
INFO = INFO / 9
CONFIG = DEBUG4 / 8
DETAIL = DEBUG3 / 7
FINE = DEBUG2 / 6
FINER = DEBUG / 5
FINEST = TRACE / 1
Subject to change
io.openliberty.microprofile.telemetry.2.0.logging.internal
io.opentelemetry.sdk.logs.export;type="third-party",\
io.opentelemetry.sdk.logs.data;type="third-party",\
io.opentelemetry.api.logs;type="third-party",\
io.openliberty.microprofile.telemetry20.OpenTelemetryHandler
, in the new project io.openliberty.microprofile.telemetry.2.0.logging.internal
, that extends com.ibm.ws.collector.Collector
, similar to how its done in LogstashCollector.OpenTelemetryHandler
class and the BND files should be similar to the LogstashCollector project, however, omit the redundant components, such as SSL, ExecutorService, which will not be needed for this feature.
Import-Package:
io.openliberty.microprofile.telemetry.internal.common
-buildpath: io.openliberty.microprofile.telemetry.internal.common;version=latest,\ io.openliberty.io.opentelemetry.2.0;version=latest
- Ensure the correct metatype is defined for the server configuration of mpTelemetry-2.0 (e.g. `logSources`)
- Update the [OpenTelemetryVersionedConfigurationImpl](https://github.com/OpenLiberty/open-liberty/blob/integration/dev/io.openliberty.microprofile.telemetry.2.0.internal/src/io/openliberty/microprofile/telemetry20/internal/config/OpenTelemetryVersionedConfigurationImpl.java) class file to remove the following lines, since we should be enabling Logs, as part of this feature.
telemetryProperties.put(OpenTelemetryConstants.CONFIG_LOGS_EXPORTER_PROPERTY, "none"); telemetryProperties.put(OpenTelemetryConstants.ENV_LOGS_EXPORTER_PROPERTY, "none");
- In the `OpenTelemetryHandler.activate()` method, retrieve and set the server-level OpenTelemetryInfo object. It will be using the [OpenTelemetryAccessor](https://github.com/OpenLiberty/open-liberty/blob/integration/dev/io.openliberty.microprofile.telemetry.internal.common/src/io/openliberty/microprofile/telemetry/internal/interfaces/OpenTelemetryAccessor.java) interface from the `io.openliberty.microprofile.telemetry.internal.common` project (TBD - details to follow from MP Telemetry team)
- Get the configured OpenTelemetry LogProvider by calling the `OpenTelemetryInfo.getOpenTelemetry().getLogsBridge()`. (https://javadoc.io/doc/io.opentelemetry/opentelemetry-api/latest/io/opentelemetry/api/OpenTelemetry.html)
- In the `OpenTelemetryHandler.formatEvents()` method, by using the previously retrieved LogProvider, get the LogBuilder (https://javadoc.io/doc/io.opentelemetry/opentelemetry-api/latest/io/opentelemetry/api/logs/LoggerProvider.html), with the configured instrumentation name, and use the builder to map the Open Liberty event records to the appropriate OpenTelemetry Log Data model.
- Once, the builder is mapped with the corresponding fields, call `builder.emit()` to export the logs to the exporter.
Below is a generic high-level code snippet on how to retrieve the OpenTelemetry LogProvider/builder, and to map generic JUL Log Record fields to Open Telemetry Log Data Model, and then to export it to the configured exporter.
Snippet is from the [Open Telemetry JUL Java agent instrumentation](https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation/java-util-logging/javaagent/src/main/java/io/opentelemetry/javaagent/instrumentation/jul/JavaUtilLoggingHelper.java) :
... String instrumentationName = logger.getName(); if (instrumentationName == null || instrumentationName.isEmpty()) { instrumentationName = "ROOT"; } LogRecordBuilder builder = GlobalOpenTelemetry.get() .getLogsBridge() .loggerBuilder(instrumentationName) .build() .logRecordBuilder(); mapLogRecord(builder, logRecord); builder.emit();
private static void mapLogRecord(LogRecordBuilder builder, LogRecord logRecord) { // message String message = FORMATTER.formatMessage(logRecord); if (message != null) { builder.setBody(message); }
// time
long timestamp = logRecord.getMillis();
builder.setTimestamp(timestamp, TimeUnit.MILLISECONDS);
// level
Level level = logRecord.getLevel();
if (level != null) {
builder.setSeverity(levelToSeverity(level));
builder.setSeverityText(logRecord.getLevel().getName());
}
AttributesBuilder attributes = Attributes.builder();
// throwable
Throwable throwable = logRecord.getThrown();
if (throwable != null) {
attributes.put(SemanticAttributes.EXCEPTION_TYPE, throwable.getClass().getName());
attributes.put(SemanticAttributes.EXCEPTION_MESSAGE, throwable.getMessage());
StringWriter writer = new StringWriter();
throwable.printStackTrace(new PrintWriter(writer));
attributes.put(SemanticAttributes.EXCEPTION_STACKTRACE, writer.toString());
}
if (captureExperimentalAttributes) {
Thread currentThread = Thread.currentThread();
attributes.put(SemanticAttributes.THREAD_NAME, currentThread.getName());
attributes.put(SemanticAttributes.THREAD_ID, currentThread.getId());
}
builder.setAllAttributes(attributes.build());
// span context
builder.setContext(Context.current());
} ...
Note: OpenTelemetry have implemented Log Appenders [Log4J](https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation/log4j/log4j-appender-2.17/library/src/main/java/io/opentelemetry/instrumentation/log4j/appender/v2_17/internal/LogEventMapper.java#L104) and [Logback](https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation/logback/logback-appender-1.0/library/src/main/java/io/opentelemetry/instrumentation/logback/appender/v1_0/internal/LoggingEventMapper.java).
Hello.
In the OpenTelemetryHandler.activate() method, retrieve and set the server-level OpenTelemetryInfo object. It will be using the OpenTelemetryAccessor interface from the io.openliberty.microprofile.telemetry.internal.common project (TBD - details to follow from MP Telemetry team)
This should work fine but there is one hidden gotcha to be aware off. In OpenTelemetryInfoFactory we have a check when an OpenTelemetryInfo is created if (j2EEName.startsWith("io.openliberty") || j2EEName.startsWith("com.ibm.ws")) {
you will need to make sure this if statement returns false when OpenTelemetryHandler calls us. I suspect without modification it will be true.
Thanks @benjamin-confino ! Good point, will make sure that doesn't break for us. Right, getOpenTelemetryInfo()
will be called by internal code, so it would be true.
POC notes from July 15th:
Further point at end of UFO: Ensure that negative cases are well tested and that instantOn is considered when testing the feature.
@yasmin-aumeeruddy Thank you for the notes, I have updated the UFO with the comments from the socialization.
Slide 9 - I think we need an epic for access logs and audit. I understand why mapping these log events is non-trivial and less important, but we should do it, especially for access logs. Slide 12 - It is odd that we don't map anything into the body for FFDC. This is raising my "something wrong" thing.
@NottyCode For Slide 9, we have opened two epics to address access and audit logs
For Slide 12, we decided to map the exception message from the triggered event to the body, in addition to Semantic Convention Attribute name (exception.message
).
I have updated the UFO with the above.
@OpenLiberty/demo-approvers Demo scheduled for EOI 24.17
@OpenLiberty/id-approvers ID Doc Issue opened: https://github.com/OpenLiberty/docs/issues/7459
@OpenLiberty/externals-approvers Can you please review the approval for this feature, there are no exposed public APIs as part of this particular feature.
Slack with Prashanth, David, Ram, me. Prashanth provided the necessary info in the following doc issue: https://github.com/OpenLiberty/docs/issues/7459. I approved the feature.
@pgunapal: WASWIN is good with the STE slides. STE approved.
OL:
Serviceability Approval Comment - Please answer the following questions for serviceability approval:
UFO -- does the UFO identify the most likely problems customers will see and identify how the feature will enable them to diagnose and solve those problems without resorting to raising a PMR? Have these issues been addressed in the implementation?
Test and Demo -- As part of the serviceability process we're asking feature teams to test and analyze common problem paths for serviceability and demo those problem paths to someone not involved in the development of the feature (eg. IBM Support, test team, or another development team).
a) What problem paths were tested and demonstrated?
b) Who did you demo to?
c) Do the people you demo'd to agree that the serviceability of the demonstrated problem scenarios is sufficient to avoid PMRs for any problems customers are likely to encounter, or that IBM Support should be able to quickly address those problems without need to engage SMEs?
SVT -- SVT team is often the first team to try new features and often encounters problems setting up and using them. Note that we're not expecting SVT to do full serviceability testing -- just to sign-off on the serviceability of the problem paths they encountered. a) Who conducted SVT tests for this feature? b) Do they agree that the serviceability of the problems they encountered is sufficient to avoid PMRs, or that IBM Support should be able to quickly address those problems without need to engage SMEs?
Which IBM Support / SME queues will handle PMRs for this feature? Ensure they are present in the contact reference file and in the queue contact summary, and that the respective IBM Support/SME teams know they are supporting it. Ask Don Bourne if you need links or more info.
Does this feature add any new metrics or emit any new JSON events? If yes, have you updated the JMX metrics reference list / Metrics reference list / JSON log events reference list in the Open Liberty docs?
@clarkek123 will be handling the serviceability approval for this epic.
Thanks for completing the FTS. The results of the mini-SOE look good so adding the FAT Focal approval.
@clarkek123 I have filled out the template below, can you please review the Serviceability approval for this feature?
Serviceability Approval Comment - Please answer the following questions for serviceability approval:
UFO -- does the UFO identify the most likely problems customers will see and identify how the feature will enable them to diagnose and solve those problems without resorting to raising a PMR? Have these issues been addressed in the implementation?
A: Yes. For the logging component of the mpTelemetry-2.0
, the logs should be exported automatically if the mpTelemetry-2.0
feature is enabled in the server.xml, along with the otel.sdk.disabled=true
is configured either in Liberty or the application, and the otel.logs.exporter
is set to a valid exporter. Below are some of the common user error scenarios for logs not being exported that we tested in FATs, as well as manually.
otel.logs.exporter
is set to none. – A warning message is shown stating that the exporter for logs is disabled.otel.sdk.disabled
is not set to false. – A warning message is shown that the OpenTelemetry SDK instance is not enabled.source
attribute is empty in the server.xml. (Test and Demo -- As part of the serviceability process we're asking feature teams to test and analyze common problem paths for serviceability and demo those problem paths to someone not involved in the development of the feature (eg. IBM Support, test team, or another development team). a) What problem paths were tested and demonstrated? b) Who did you demo to? c) Do the people you demo'd to agree that the serviceability of the demonstrated problem scenarios is sufficient to avoid PMRs for any problems customers are likely to encounter, or that IBM Support should be able to quickly address those problems without need to engage SMEs?
A: a) The problems paths mentioned above were also demoed to analyze the problem paths. b) Local, EOI, SVT, Performance teams. c) Yes
SVT -- SVT team is often the first team to try new features and often encounters problems setting up and using them. Note that we're not expecting SVT to do full serviceability testing -- just to sign-off on the serviceability of the problem paths they encountered. a) Who conducted SVT tests for this feature? b) Do they agree that the serviceability of the problems they encountered is sufficient to avoid PMRs, or that IBM Support should be able to quickly address those problems without need to engage SMEs?
A: a) Daniel Guinan b) Yes
Which IBM Support / SME queues will handle PMRs for this feature? Ensure they are present in the contact reference file and in the queue contact summary, and that the respective IBM Support/SME teams know they are supporting it. Ask Don Bourne if you need links or more info.
A: WAS L3: Logging
Does this feature add any new metrics or emit any new JSON events? If yes, have you updated the JMX metrics reference list / Metrics reference list / JSON log events reference list in the Open Liberty docs?
A: No
Based on the information provided above for Serviceability showing common error paths testing with demo with approval from Local, SVT and Perfomance teams and SVT signoff on the paths included for serviceability, I have added the Serviceability approval for this feature.
Description
We need a way for users to be able to direct their Liberty logs to OpenTelemetry.
Documents
When available, add links to required feature documents. Use "N/A" to mark particular documents which are not required by the feature.
Externally raised requests for enhancements:
Aha idea
Requested feature
UFO: https://ibm.box.com/s/ihoa8t3h18e8w2t41gdcuveo90q24mrd
FTS: https://github.com/OpenLiberty/open-liberty/issues/29355
Beta Blog: https://github.com/OpenLiberty/open-liberty/issues/29332
GA Blog: Link to GA Blog Post GH Issue
Process Overview
Prioritization
Design
Implementation
Legal and Translation
Beta
GA
Other Deliverables
General Instructions
The process steps occur roughly in the order as presented. Process steps occasionally overlap.
Each process step has a number of tasks which must be completed or must be marked as not applicable ("N/A").
Unless otherwise indicated, the tasks are the responsibility of the feature owner or a delegate of the feature owner.
If you need assistance, reach out to the OpenLiberty/release-architect.
Important: Labels are used to trigger particular steps and must be added as indicated.
Prioritization (Complete Before Development Starts)
The OpenLiberty/chief-architect and area leads are responsible for prioritizing the features and determining which features are being actively worked on.
Prioritization
[x] Feature added to the "New" column of the Open Liberty project board
Prioritization - Requested
[x] Priority assigned
Prioritization - Requested
label removed (OpenLiberty/project-manager or feature owner)Design (Complete Before Development Starts)
Design preliminaries determine whether a formal design, which will be provided by an Upcoming Feature Overview (UFO) document, must be created and reviewed. A formal design is required if the feature requires any of the following: UI, Serviceability, SVT, Performance testing, or non-trivial documentation/ID. Furthermore, each identified item places a blocking requirement on another team so it must be identified early in the process. The feature owner may check-off the item if they know it doesn't apply, but otherwise they should work with the focal point to determine what work, if any, will be necessary and make them aware of it.
Design Preliminaries
ID Required
, if non-trivial documentation needs to be created by the ID team.ID Required - Trivial
, if no design will be performed and only trivial ID updates are needed.Design
Design Review Request
Design Approval Request
Design Approved
No Design
No Design Approval Request
No Design Approved
Product Management Approval Request
and notifies OpenLiberty/product-managementProduct Management Approved
(OpenLiberty/product-management)FAT Documentation
[x] "Feature Test Summary" child task created
Implementation
A feature must be prioritized before any implementation work may begin to be delivered (inaccessible/no-ship). However, a design focused approach should still be applied to features, and developers should think about the feature design prior to writing and delivering any code.
Besides being prioritized, a feature must also be socialized (or No Design Approved) before any beta code may be delivered. All new Liberty content must be inaccessible in our GA releases until it is Feature Complete by either marking it
kind=noship
or beta fencing it.Code may not GA until this feature has obtained the
Design Approved
orNo Design Approved
label, along with all other tasks outlined in the GA section.Feature Development Begins
In Progress
labelLegal and Translation
In order to avoid last minute blockers and significant disruptions to the feature, the legal items need to be done as early in the feature process as possible, either in design or as early into the development as possible. Similarly, translation is to be done concurrently with development. All items below MUST be completed before beta & GA is requested.
Innovation (Complete 1 week before Beta & GA Feature Complete Date)
Legal (Complete before Beta & GA Feature Complete Date)
Translation (Complete by Beta & GA Feature Complete Date)
[x] PII (Program Integrated Information) updates are merged (i.e. all English strings due for translation have been delivered), or N/A.
Beta
In order to facilitate early feedback from users, all new features and functionality should first be released as part of a beta release.
Beta Code
kind=beta
,ibm:beta
,ProductInfo.getBetaEdition()
target:beta
and the appropriatetarget:YY00X-beta
(where YY00X is the targeted beta version).release:YY00X-beta
(where YY00X is the first beta version that included the functionality).Beta Blog (Complete by beta eGA)
[x] Beta blog issue created and populated using the Open Liberty BETA blog post template.
GA
A feature is ready to GA after it is Feature Complete and has obtained all necessary Focal Point Approvals.
Feature Complete
Translation - Complete
orTranslation - Missing
labelrelease
branch, feature owner adds labelTranslation - Complete
.Translation - Missing
.Translation - Missing
label is replaced withTranslation - Complete
.Translation - Blocked
label.Translation - Blocked
may NOT proceed to GA until the label has been replaced with eitherTranslation - Missing
orTranslation - Complete
.target:ga
and the appropriatetarget:YY00X
(where YY00X is the targeted GA version).Focal Point Approvals (Complete by Feature Complete Date)
These occur only after GA of this feature is requested (by adding a
target:ga
label). GA of this feature may not occur until all approvals are obtained.All Features
focalApproved:externals
@OpenLiberty/demo-approvers Demo scheduled for EOI [Iteration Number]
to this issue.focalApproved:demo
.focalApproved:fat
.Design Approved Features
focalApproved:id
.focalApproved:instantOn
.focalApproved:performance
.focalApproved:sve
.focalApproved:ste
.focalApproved:svt
.Remove Beta Fencing (Complete by Feature Complete Date)
GA Blog (Complete by Friday after GM)
Post GM (Complete before GA)
Post GA
[ ] Remove the
target:ga
andtarget:YY00X
labels, and add the appropriaterelease:YY00X
. (OpenLiberty/release-manager)Other Deliverables
[ ] Standalone Feature Blog Post - A blog post specifically about your feature or N/A. (Feature owner and OpenLiberty/release-architect)
[ ] OL Guides - OL Guides assessment is complete or N/A. (OpenLiberty/guide-assessment)
[ ] Dev Experience - Developer Experience & Tools work is complete or N/A. (OpenLiberty/dev-experience-assessment)