OpenLiberty / open-liberty

Open Liberty is a highly composable, fast to start, dynamic application server runtime environment
https://openliberty.io
Eclipse Public License 2.0
1.15k stars 588 forks source link

OpenTelemetry Logging Investigation #27647

Closed pgunapal closed 2 months ago

pgunapal commented 7 months ago

Investigate how to route Open Liberty logs to OpenTelemetry Logging.

pgunapal commented 7 months ago

There are 2 different methods to route Open Liberty logs to Open Telemetry Logging.

  1. Reading the log files (messages.log) directly, with the OTel Collector or using a log collection agent such as FluentBit. In the first case, the OTel Collector must reside in the same environment as the Liberty runtime, or at least have access to the log files, so it can read and parse the log data. However, latter approach, the FluentBit log collection agent can reside elsewhere. We can use the FileLogReceiver in the OTel Collector to read the Open Liberty message.log file and use the json_parser operator, to parse the JSON logs and export it to OTel. It can also parse human-readable text logs as well, however it might become complicated with multiple RegEx to parse each log entry. Ref: https://opentelemetry.io/docs/specs/otel/logs/#via-file-or-stdout-logs

    • We would need to ensure that we have the correct context for each log record entry, so we can capture the correct traceIDs and spanIDs.
  2. The second approach is using the Open Telemetry Logs Bridge API and SDK to bridge (append) the Open Liberty Logs and export it over to the OTel Collector via OTLP. Ref: https://opentelemetry.io/docs/specs/otel/logs/#direct-to-collector

    • The OpenTelemetry defines a Logs Bridge API for emitting LogRecords. OpenTelemetry provides a Bridge API and SDK, which can be used together with existing logging libraries to automatically inject the trace context in the emitted logs, and provide an easy way to send the logs via OTLP. Instead of modifying each logging statement, log appenders use the API to bridge logs from existing logging libraries to the OpenTelemetry data model, where the SDK controls how the logs are processed and exported. Currently, OTel supports Log Appender instrumentations for LogBack and Log4j2, there isn't a Log Appender for JUL. Open Telemetry does support auto-instrumentation using a Java Agent for JUL, which I don't think would be feasible approach for Open Liberty, for various reasons.
    • This approach we can guarantee that we have the appropriate context to retrieve the corresponding traceIDs and spanIDs from the OTel signals for the different services.

Open Telemetry defines its own Log Data Model, in which the Open Liberty log records should have a corresponding mapping to.

Example output of Open Liberty JSON Logging in Open Telemetry Logging format using the first approach, where the OTel Collector is reading the JSON logs from the messages.log file : Notes:

otel-collector-1  | LogRecord #5
otel-collector-1  | ObservedTimestamp: 2024-02-20 20:16:45.675532858 +0000 UTC
otel-collector-1  | Timestamp: 2024-02-20 20:16:45.589 +0000 UTC
otel-collector-1  | SeverityText: AUDIT
otel-collector-1  | SeverityNumber: Unspecified(0)
otel-collector-1  | Body: Str({"type":"liberty_message","host":"my_host","ibm_userDir":"\/opt\/ol\/wlp\/usr\/","ibm_serverName":"defaultServer","message":"CWWKF0011I: The defaultServer server is ready to run a smarter planet. The defaultServer server started in 1.297 seconds.","ibm_threadId":"0000002d","ibm_datetime":"2024-02-20T20:16:45.589+0000","ibm_messageId":"CWWKF0011I","module":"com.ibm.ws.kernel.feature.internal.FeatureManager","loglevel":"AUDIT","ibm_sequence":"1708460205589_0000000000014","ext_thread":"Default Executor-thread-1"})
otel-collector-1  | Attributes:
otel-collector-1  |      -> ibm_userDir: Str(/opt/ol/wlp/usr/)
otel-collector-1  |      -> ibm_messageId: Str(CWWKF0011I)
otel-collector-1  |      -> log.file.name: Str(messages.log)
otel-collector-1  |      -> host: Str(0ee2e07b2b6c)
otel-collector-1  |      -> message: Str(CWWKF0011I: The defaultServer server is ready to run a smarter planet. The defaultServer server started in 1.297 seconds.)
otel-collector-1  |      -> ext_thread: Str(Default Executor-thread-1)
otel-collector-1  |      -> loglevel: Str(AUDIT)
otel-collector-1  |      -> ibm_sequence: Str(1708460205589_0000000000014)
otel-collector-1  |      -> ibm_serverName: Str(defaultServer)
otel-collector-1  |      -> module: Str(com.ibm.ws.kernel.feature.internal.FeatureManager)
otel-collector-1  |      -> ibm_threadId: Str(0000002d)
otel-collector-1  |      -> ibm_datetime: Str(2024-02-20T20:16:45.589+0000)
otel-collector-1  |      -> type: Str(liberty_message)
otel-collector-1  | Trace ID: ***my_traceID***
otel-collector-1  | Span ID: ***my_spanID***
otel-collector-1  | Flags: 0
pgunapal commented 2 months ago

Closing -- The investigation is complete, the implementation will be tracked by #27711