apache / logging-log4j2

Apache Log4j 2 is a versatile, feature-rich, efficient logging API and backend for Java.
https://logging.apache.org/log4j/2.x/
Apache License 2.0
3.36k stars 1.6k forks source link

Support for Custom Value Serializer in Log4jEventBuilder#addKeyValue() and Asynchronous Serialization #2996

Open AlexxeyKa opened 4 days ago

AlexxeyKa commented 4 days ago

In the current Log4jEventBuilder#addKeyValue(String key, Object value) implementation, complex objects are serialized using String.valueOf(), which lacks flexibility. My suggestion is to allow users to provide a custom serializer that can handle more complex serialization needs

We can provide a StringBuilderFormattable specialization there. That is, we can change addKeyValue() implementation such that:

String encodedValue = value instance StringBuilderFormattable
        ? encodeUsingStringBuilderFormattable((StringBuilderFormattable) value)
        : toString(value);
keyValuePairs.put(key, encodedValue);

But,

... handling complex objects, especially when structured logging is required. ... complex serialization needs (e.g., converting objects to JSON)

There is so much to unpack here.

For one, logging as follows

LOGGER.info("userContext: {}", toJSON(userContext));

is a bad-practice, because this implies an assumption on the logging layout employed, which contradicts with the separation-of-concerns practiced with the logging API vs. logging implementation separation. What if the used layout is XML? etc. Hence, we should avoid encoding payloads with assumptions on the employed layout. Instead, we should allow registering arbitrarily-typed parameters, if necessary, with sufficient encoding hints (e.g., extending from MultiFormatStringBuilderFormattable) and let them be handled by the employed layout.

Currently, addKeyValue() admissions get registered to a Map<String, String> – we should amend this with a Map<String, Object>. Though the bigger problem is how KV-pairs are passed to the LogBuilder:

try (final Instance c = CloseableThreadContext.putAll(keyValuePairs)) {
  logBuilder.log(message, arguments.toArray());
}

That is, log4j-slf4j2-impl hacks auto-clearing thread context (i.e., CloseableThreadContext) to pass the KV-pairs all the way down to the layout. This is due to the fact that LogBuilder of Log4j doesn't have a counterpart for arbitrary KV-pairs and the feature that comes closest to it we can leverage (and we did) is thread context. There are a couple of angles we can approach this problem from:

  1. Implement accepting non-String-typed values in CloseableThreadContextLOG4J2-1648 proposes this. Registering instances of custom types in a thread context can cause memory leakage in Java EE container environments, though this should not be a problem for CloseableThreadContext.
  2. Extend LogBuilder to accept custom KV-pairs

I propose that the serialization of the value should happen asynchronously, offloading the potentially expensive serialization task to a separate thread. This will prevent blocking the thread that calls logging methods (such as log.info()), thus improving performance in high-throughput logging environments where structured logging is used extensively.

"Improve performance in high-throughput logging environments [by offloading the log event encoding to a background thread]" this is a claim that cannot hold without given sufficient context. Given the context, most of the time it becomes a special case rather than a general one we can practice for every user. Nevertheless, if we can pass arbitrary objects in KV-pairs (and effectively allow the layout to take charge of the complete encoding effort), you can combine this with asynchronous logging to effectively encode events in a thread different from the one invoking the logging API.

rgoers commented 3 days ago

LOL. @vy Remember the Renderer I wanted to add for use in ScopedContext? Here is another use for it.

vy commented 2 days ago

Remember the Renderer I wanted to add for use in ScopedContext?

@rgoers, yes, I do. Note that I wasn't against the Renderer concept, but introducing a new interface where existing ones (e.g., StringBuilderFormattable, MultiFormatStringBuilderFormattable) can perfectly suffice. See my comment.

Here is another use for it.

User is using SLF4J. You're suggesting him to

  1. either completely switch to the Log4j API and use ScopedContext1
  2. or combine SLF4J with Log4j API's ScopedContext1 (hence, effectively use both logging APIs).

@AlexxeyKa will be the judge of this.

1 ScopedContext will be released in the future as a separate module to enhance Log4j API with a scoped thread context pretty much identical to the one you would get with Java's ScopedValue.

AlexxeyKa commented 1 day ago

@vy Thanks for your reply.

I agree that using LOGGER.info("userContext: {}", toJSON(userContext)); is not an option for two reasons:

  1. Serialization still happens in the main thread.
  2. The serialized value is embedded in the message – instead of being handled as a key-value pair for structured logging.

While switching to Log4jEventBuilder#addKeyValue("userContext", toJSON(userContext)) could address the second issue (using structured key-value logging), the first problem remains: the serialization still occurs on the main thread.

The core of my request is to store the key-values passed via addKeyValue() in the LogEvent as a Map<String, Object> rather than converting them to a ReadOnlyStringMap. This would allow the layout to handle object serialization, giving developers the flexibility to define how and when values are serialized, especially when combined with asynchronous logging. By default, key-values passed via addKeyValue() could still be serialized using toString() and embedded into the message. However, if I want to create a custom log appender/plugin/layout, I’d like to have the ability to log these values in a different way. E.g. in my custom log4j Layout, I want to be able to detect when values were passed via addKeyValue(), so that I can log them differently. This distinction would enable me to treat key-value pairs in a special way, such as formatting them for structured logging.

I think this option

Extend LogBuilder to accept custom KV-pairs

...is a good way, but it’s important that LogEvent is also modified to store these key-value pairs separately.

Regarding ScopedContext, will JDK 21 be required?

ppkarwasz commented 1 day ago

It is worth noting that in #1813, we agreed to that addKeyValue() should generate a MapMessage instead of adding keys to the context data of the log event.

For those not accustomed to the difference between log message and event, there is a note in the Messages documentation page. Currently addKeyValue() fills the content of "labels", whereas it should fill the content of "message" in the example below:

{

  "log.level": "INFO",
  "message": "Unable to insert data into my_table.",
  "error.type": "java.lang.RuntimeException",
  "error.message": null,
  "error.stack_trace": [
    {
      "class": "com.example.Main",
      "method": "doQuery",
      "file.name": "Main.java",
      "file.line": 36
    },
    {
      "class": "com.example.Main",
      "method": "main",
      "file.name": "Main.java",
      "file.line": 25
    }
  ],
  "marker": "SQL",
  "log.logger": "com.example.Main",

  "tags": [
    "SQL query"
  ],
  "labels": {
    "span_id": "3df85580-f001-4fb2-9e6e-3066ed6ddbb1",
    "trace_id": "1b1f8fc9-1a0c-47b0-a06f-af3c1dd1edf9"
  },

  "@timestamp": "2024-05-23T09:32:24.163Z",
  "log.origin.class": "com.example.Main",
  "log.origin.method": "doQuery",
  "log.origin.file.name": "Main.java",
  "log.origin.file.line": 36,
  "process.thread.id": 1,
  "process.thread.name": "main",
  "process.thread.priority": 5
}
AlexxeyKa commented 21 hours ago

@ppkarwasz Thank you for noticing this.

If the map with key-values passed via addKeyValue() remains as Map<String, Object> in the Message (and values will not be stringified) it should solve my request.