elastic / ecs-logging-java

https://www.elastic.co/guide/en/ecs-logging/java/current/intro.html
Apache License 2.0
139 stars 74 forks source link

Namespace MDC Fields with either labels or tags. #142

Open rdifrango opened 2 years ago

rdifrango commented 2 years ago

I noticed when I was working with the ECS Java loggers that MDC and/or Metadata fields weren't name spaced.

I can see one of three scenario's:

  1. Make the default namespace be labels per [https://www.elastic.co/guide/en/ecs/current/ecs-base.html](Base Fields)
  2. Make the default namespace be tags per [https://www.elastic.co/guide/en/ecs/current/ecs-base.html](Base Fields)
  3. Allow for a customized namespace via a configuration parameter.

Here's an example configuration:

<configuration>
    <appender name="Console" class="ch.qos.logback.core.ConsoleAppender">
        <encoder class="co.elastic.logging.logback.EcsEncoder">
            <serviceName>my-application</serviceName>
            <serviceNodeName>my-application-cluster-node</serviceNodeName>
            <additionalField>
                <key>environment</key>
                <value>dev</value>
            </additionalField>
        </encoder>
    </appender>

    <logger name="co.elastic" level="DEBUG"/>

    <root level="WARN">
        <appender-ref ref="Console"/>
    </root>
</configuration>

and a resulting output:

{
"@timestamp": "2021-08-31T15:57:11.355Z",
"log.level": "DEBUG",
"message": "Example with the default Meta Data fields.",
"ecs.version": "1.2.0",
"service.name": "my-application",
"service.node.name": "my-application-cluster-node",
"event.dataset": "my-application.log",
"process.thread.name": "main",
"log.logger": "co.elastic.DemoApplication",
"environment": "dev"
}
felixbarny commented 2 years ago

Hi and thanks for your issue! Could you describe why it's an issue for you that additional fields and MDC are not namespaced?

We previously did namespace them but removed that in https://github.com/elastic/ecs-logging-java/pull/67.

rdifrango commented 2 years ago

@felixbarny it's not a problem per se; it just that I would like the ability to namespace them given that the spec has a placeholder for these attributes to land.

I also logged a defect over on the nodejs project and they were considering the same.

I'll also highlight that the .NET variant does namespace them with the result being something like:

"metadata": {
    "environment": "dev"
},
felixbarny commented 2 years ago

When namespacing, it makes it harder to set actual ECS fields, such as service.environment.

For the things you'd like to be namespaced, could you just prefix them with labels.? such as

<additionalField>
    <key>labels.environment</key>
    <value>dev</value>
</additionalField>

Better yet, for this particular field, you'll probably want to set service.environment.

<additionalField>
    <key>service.environment</key>
    <value>dev</value>
</additionalField>
rdifrango commented 2 years ago

@felixbarny I agree for the environment field that your proposal is the more correct one, but it was meant to be representative. In my case I have ~10 fields that I will be placing into the MDC that we would like them grouped accordingly.

Also, one things to know is that neither KeyValuePair for log4j2 or additionalField for logback groups fields that are similarly prefixed.

For example we will be emitting the cloud fields:

<additionalField>
                <key>cloud.provider</key>
                <value>MY CLOUD PROVIDER</value>
            </additionalField>
            <additionalField>
                <key>cloud.availability_zone</key>
                <value>MY REGION</value>
            </additionalField>

and when they are emitted the come out at the root level:

"cloud.provider": "MY CLOUD PROVIDER",
"cloud.availability_zone": "MY REGION",

Instead I'd expect roughly the following:

"cloud" : { 
    "provider": "MY CLOUD PROVIDER",
    "availability_zone": "MY REGION",
},
felixbarny commented 2 years ago

Ah, so your issue is about dotted vs nested field names? See this issue for more info and rationale about it: https://github.com/elastic/ecs-logging-java/issues/51#issuecomment-908104672

rdifrango commented 2 years ago

@felixbarny yes and no. In this particular case I want the ability to specify the top level container for my additional attributes that are housed in MDC.

I will say though that; that gives a potential path to resolving it.

ChristianCiach commented 1 year ago

We, too, are sad to see that MDC values are not namespaced under labels anymore. I would even go as far to say that #67 was a very bad change, or at the very least a shortsighted one.

For the stated reasons, please at least provide an option to restore the previous behavior by making the namespace (and by "namespace" i mean "common key prefix") for MDC values configurable using the logback.xml file.

felixbarny commented 1 year ago

Hi and sorry for the late reply.

The intention of this change was to make it simpler to add top-level fields (both ECS fields and custom fields). I still think that this generally makes sense but I do acknowledge that this can lead to nasty issues like mapping conflicts and data loss. I think there's a way to reconcile both - making it possible to add arbitrary top-level fields without making it brittle. But it requires some changes to the mappings.

We're currently thinking about enhancing the default mappings to make them less brittle out-of-the-box: https://github.com/elastic/elasticsearch/issues/88777. There's an early POC version of the mappings available in this PR. The key parts are to set ignore_malformed: true, dynamic: runtime, and subobjects: false (after https://github.com/elastic/elasticsearch/issues/88934 has been resolved).

It was a breaking change without an option to restore previous behavior.

This change was introduced before version 1.0.0 was released. According to semantic versioning, breaking changes in 0.x are expected.

Application developers can now easily add fields by mistake that are not defined by the ECS.

ECS is meant to be an extensible schema where projects or organizations can add custom fields as they see fit. But if the custom fields conflict with ECS, this can become can issue indeed.

Application developers can accidentally override existing fields. This will regularly happen because developers often just dump a map with meta information about the current context into the MDC.

Do you have an example from where you have seen this issue in the wild? Which field was the one that was overridden?

Application developers need to prefix the MDC keys with labels. or tags. to have the values correctly put into the ECS. This makes no sense at all to me. Our developers don't even know that their applications are logging in the ECS format. This only happens in production where the logging configuration is injected as external configuration by the DevOps team (aka, me). There is just no way I can reliably nudge all our developers to always prefix all their MDC keys with labels. (nor do I want to, because the applications shouldn't be developed against a specific logging format).

I agree. Developers should not need to change their usage of MDC. My idealized view of it is that it should be fine to add any top-level fields to the document without it being an issue. I'm aware that we're not currently in that state, especially when using the default mappings.

For the stated reasons, please at least provide an option to restore the previous behavior by making the namespace (and by "namespace" i mean "common key prefix") for MDC values configurable using the logback.xml file.

One of the issues with that is that our APM agents add a trace.id field to the MDC (which is part of the ECS spec). We want these to appear at the top-level so that log correlation works correctly. Again, it comes down to MDC being a mix of ECS fields, an organization's extension of ECS, and arbitrary key/value pairs. Prefixing all MDC values with labels. is also not a one-size-fits all solution because some of the properties need to be at the top level. We could add a config to list the properties that should be promoted to the top-level but that'll make it more difficult to add things to the top level. But certainly something we could consider, at least in the interim until we have a solution that works well with arbitrary top-level properties.

ChristianCiach commented 1 year ago

This change was introduced before version 1.0.0 was released. According to semantic versioning, breaking changes in 0.x are expected.

That's fair. I still think this was a change for the worse.

ECS is meant to be an extensible schema where projects or organizations can add custom fields as they see fit. But if the custom fields conflict with ECS, this can become can issue indeed.

I understand that, but I actually want all the properties that our developers put into the MDC to be logged with the labels.-prefix, without exception. The only way to enforce that is to make it configurable via the global logging settings (like logback.xml). I guess I am not the only one who thinks that way, since this was the hardcoded behavior before the change.

Do you have an example from where you have seen this issue in the wild? Which field was the one that was overridden?

If by "in the wild" you mean open source projects, then I don't know. But I've seen this happen in some of our many many microservices, mostly with the message field. It would also regularly happen with the timestamp field, but luckily this field is prefixed with @ in the MDC. Still, it sucks to have both timestamp and @timestamp on the top level. Also, dumping the MDC values to the top level makes it pretty much impossible to automatically retrieve the complete set of logged MDC values from the log, because then I have to first retrieve all of the top level fields and manually remove all the keys that I know to actually originate from the ECS. In other words: I want to know which keys of a given log message come from the MDC. This is pretty much impossible at the moment. Before the change, I could just retrieve the labels.

One of the issues with that is that our APM agents add a trace.id field to the MDC (which is part of the ECS spec). We want these to appear at the top-level so that log correlation works correctly.

It looks like the change was made for the sole purpose to enable your APM agents to log trace.id (and maybe some other properties) at the top level, but this seems almost hacky. Your proposed solution to promote a specific set of properties (like trace.id) to the top level seems a lot more sensible to me. The logstash-logback-encoder has a special API beside the default MDC-API to log properties to the top level, see https://www.innoq.com/en/blog/structured-logging/#structuredarguments . Maybe something like this can be done here, too. Your APM agents could then use this API to avoid MDC altogether.

My idealized view of it is that it should be fine to add any top-level fields to the document without it being an issue.

Actually, I would rather not even give the developers the option to pollute the top level namespace. Again, our developers don't even know that they are ultimately logging to ECS. Logging to Elastic is my responsibility and I don't ever want the applications to be able to "break out" from their labels namespace.

felixbarny commented 1 year ago

What if your developers use structured logging concepts to add fields to the log instead of MDC? Would you expect these to be at the top level or also nested?

felixbarny commented 1 year ago

If by "in the wild" you mean open source projects, then I don't know. But I've seen this happen in some of our many many microservices, mostly with the message field.

No, I meant in the services you're operating. The message field is a great example, thanks.

It looks like the change was made for the sole purpose to enable your APM agents to log trace.id (and maybe some other properties) at the top level, but this seems almost hacky.

Not only for that use case. The idea was that it would also be easier for users to add top-level ECS properties, such as user.id, using the MDC

ChristianCiach commented 1 year ago

What if your developers use structured logging concepts to add fields to the log instead of MDC? Would you expect these to be at the top level or also nested?

That's a good question. Our developers currently just don't. But we are currently using logstash-logback-encoder (mostly because of this issue), where the keyValue-API is defined to always log to the top level, and I am fine with that distinction as compared to MDC.

Actually, I would rather not even give the developers the option to pollute the top level namespace. Again, our developers don't even know that they are ultimately logging to ECS

Let me clarify that: I don't want our developers to be able to accidentally pollute the top level namespace via MDC. If they are using the structured logging API on purpose to add keys to the top level, then I would be fine with that.

ChristianCiach commented 1 year ago

Actually, some of our developers already use the structured logging concepts of logback to add a very specific top level key, namely alert. Logback/slf4j has "error" as the highest log level, but we don't want to alert our 24/7 support on every "error", so we are using markers instead. These are expected to go to the top level and this is absolutely fine, because these are a specific, very controlled set of keys, so there is no danger to accidentally pollute the top level namespace this way.

felixbarny commented 1 year ago

I've mostly been looking at MDC as another means of doing structured logging. Often, the MDC is used as a workaround if the logging library doesn't provide a native way for structured logging, such as slf4j 1.x. But modern libraries, such as the recently released slf4j 2.x, as well as log4j 2.x have support for structured logging (IIRC), it may be appropriate to differ MDC and actual structured logging.

However, I'm not sure if it's intuitive if these two ways of adding a user.id result in different behavior:

logger.info(new StringMapMessage()
    .with("message", "User has made a new comment")
    .with("user.id", "felixbarny"));
try (MDC.MDCCloseable closable = MDC.putCloseable("user.id", "felixbarny")) {
    logger.info("User has made a new comment");
}

because these are a specific, very controlled set of keys, so there is no danger to accidentally pollute the top level namespace this way.

I'm wondering if that's generally the case or just for your current use cases. Let's say that a team in your organization decides they want to migrate from MDC to structured logging. They might just translate everything they put into the MDC to a structured log event in which case we're back at square 1. So maybe it is reasonable to treat MDC and structured logging in the same way?

I want to know which keys of a given log message come from the MDC.

Could you elaborate on why you want to know which keys are coming from the MDC? Would you also want to know the keys that are coming from a structured logging event?

ChristianCiach commented 1 year ago

I think MDC serves a difference use-case than general structured logging. Usually I see MDC used to just dump the current "context" of a transaction. This includes the transaction-id, the logged in user.. but really, this can be anything attached to the current transaction context. In the Go programming language, there is usually a context object which is really just a set of key-value-pairs that can contain everything. Spring uses a TransactionContext for similar use cases. This is what I often see being dumped into the MDC.

Of course, many of the keys of the transaction context should be better mapped to standardized fields of the ECS, but that is something that I want to do, for example using FluentBit mappers. I want to keep our developers blissfully ignorant about everything ECS.

Could you elaborate on why you want to know which keys are coming from the MDC?

As I just described, sometimes I want to see the full transaction context for debugging purposes.

rdifrango commented 1 year ago

I think MDC serves a difference use-case than general structured logging. Usually I see MDC used to just dump the current "context" of a transaction. This includes the transaction-id, the logged in user.. but really, this can be anything attached to the current transaction context. In the Go programming language, there is usually a context object which is really just a set of key-value-pairs that can contain everything. Spring uses a TransactionContext for similar use cases. This is what I often see being dumped into the MDC.

I was going to make a similar comment that MDC is often used contextual fields that span across all methods within a given flow. For example, if you use distributed tracing that's something that needs to be present in all log records and the only way to do so is via MDC unless you want to pass that as a direct parameter to all methods.

felixbarny commented 1 year ago

Thanks for highlighting these nuances.

Let me think about it a bit more but I do get your points.

If we were to change it, we'd probably have to do that by making users opt-in to storing MDC as labels as it might be considered a breaking change otherwise.

felixbarny commented 1 year ago

Let me try to summarize:

We add an option to prefix all MDC values with labels..

This serves two purposes: It makes it easier to determine all fields that are valid throughout the whole (request) context and it avoids mapping conflicts and conflicts with fields from the log event, such as message.

Some MDC values will need to be excluded from this so that they're at the top level. This is needed at least for the ones that are needed trace/log correlation, such as trace.id, transaction.id, span.id, and error.id. We still want the correlation to work via MDC as it's the most reliable and universal way to achieve log correlation. Not using MDC would also be a breaking change which would prevent plain-text logging users to add these MDC values via %X/%X{trace.id}. We might also add a way for users to opt-in to specific fields being at the top level. But there are also arguments for not allowing that.

When adding key/value pairs to a specific log event (structured logging), the key/value pairs should remain to be on the top level, though.

Although MDC can be used as a means to do structured logging on frameworks that don't have 1st party support for structured logging, it often serves a different purpose. Rather than a deliberate per-event fields, the MDC contains properties that are added to all events in a given context. Also, with the recently released slf4j2, all modern logging frameworks have built-in support for structured logging.

Although, you could argue that you'd want to map some of these request scoped fields to ECS fields, too.

This includes the transaction-id, the logged in user..


I'm not entirely convinced by this line of argumentation as the same conflicts can occur with both top-level structured logging fields and MDC fields. Also, some MDC fields do make sense at the top level. But there's probably no harm in having an opt-in option to prefix MDC values with labels.. Especially because this would reduce the chance of mapping conflicts with the current default mappings in Elasticsearch.

Long-term, I think we should make it simple to just send arbitrary JSON (that may even contain duplicate field names) to Elasticsearch without having to worry about mapping issues. In this future, the proposed config option is not needed for the majority of use cases.

@xeraa I wonder if you have any thoughts on this.

lauroschuck commented 10 months ago

Any time frame for this change? Marked as 8.8 candidate, and we're already on 8.9?