Separation of log.level from log { logger }

pushred commented 1 year ago

I seem to have run into some incompatibility between a few Elastic efforts that I'm currently stuck on due to the use of dot notation for log.level in this library. My end goal is for all ingested logs to be in the nested notation, per a comment from @felixbarny in this Community thread and personal preference. However I think I am blocked in my particular situation:

ecs-pino-format is used to add a foundation of ECS fields
elastic-serverless-forwarder is used to collect Lambda logs via AWS Kinesis
Elasticsearch Ingest Pipelines are used to remove some wrapping data structures around my original application logs (two layers, from Kinesis and CloudWatch)

The blocker I've run into is an inability to access the Pino formatter's log.level in the pipeline because it uses dot notation. Docs indicate that I should use the dot_expander processor first. However, it is limited itself to accessing top-level leaf fields. In my case I am parsing an incoming message into a temporary top-level object in order to selectively pick data from it to merge into the root, using the set processor.

If I try to expand log.level as a child of the temporary object:

{
  "dot_expander": {
    "field": "parsed_app_log",
    "path": "log.level"
  }
}

...I get a pipeline validation exception:

[field] field does not contain a dot and is not a wildcard

If I try to add the parsed JSON to the root I also get an error:

{
  "type": "illegal_argument_exception",
  "reason": "cannot add non-map fields to root of document"
}

I may just need to reach out to support to understand this error that could have a straightforward explanation I'm not getting from the error itself.

But while I pursue that route I wanted to ask why log.level uses dot notation here in the first place, when log { logger } is added if the APM integration is enabled? My understanding is that level also belongs to the log ECS field. I assume this is for backwards compatibility reasons.

Also, if log.level and log { logger } do coexist at ingestion time, what exactly is the consequence of mixing notations? Why is @felixbarny advocating nested notation when Elastic's own libraries mix notations? Are users expected to use ingest pipelines as I am to normalize their idiosyncrasies?

trentm commented 1 year ago

@pushred I don't have any experience with ingest pipelines and I may be misunderstanding the structure of your documents, but from the dot_expander docs do you possibly have the "field" and "path" values reversed? Does this work:

{
  "dot_expander": {
    "field": "log.level",
    "path": "parsed_app_log"
  }
}

trentm commented 1 year ago

But while I pursue that route I wanted to ask why log.level uses dot notation here in the first place, when log { logger } is added if the APM integration is enabled?

The spec states a preference for "log.level" being an unnested field. That doesn't answer why though. https://www.elastic.co/guide/en/ecs-logging/overview/current/intro.html#_why_ecs_logging includes:

Decently human-readable JSON structure

The first three fields are @timestamp, log.level and message. This lets you easily read the logs in a terminal without needing a tool that converts the logs to plain-text.

which is perhaps the main reasoning.

That the ecs-logging-nodejs pino library adds log: { logger: '...' } separately is necessary the way Pino formatting works. Pino separately renders the string for the log level (e.g. "log.level":"debug",) from the string for extra added fields to reduce re-JSON-stringifying fields for perf. The separation also follows this example ECS log record from the ecs-logging spec repo: https://github.com/elastic/ecs-logging/tree/main/spec#a-richer-event-context

pushred commented 1 year ago

Ahh thank you! Flipping field and path indeed works. We had resorted to a Painless script to workaround but this solution is definitely preferable. For some reason I was misinterpreting this part in the docs:

because the field option can only understand leaf fields.

…to mean that I could only specify top-level fields in the document. Must have been late in the day.

Understood on the origins of this decision and the intention. Thanks for the explanation. This is probably much less of an issue normally, there are some AWS particulars and the Serverless Forwarder's decision not to abstract that away that lead me to this issue. I opened an issue separately about that but thankfully the Ingest Pipeline processors are capable enough to workaround.

trentm commented 1 year ago

Nice. The dot_expander docs for on "path" are certainly not clear to me.

elastic / ecs-logging-nodejs

Separation of log.level from log { logger } #140