elastic / apm

Elastic Application Performance Monitoring - resources and general issue tracking for Elastic APM.
https://www.elastic.co/apm
Apache License 2.0
384 stars 114 forks source link

logs dataset naming constraints #751

Open SylvainJuge opened 1 year ago

SylvainJuge commented 1 year ago

The specification for logging currently allows to have values in event.dataset that do not fit the data_stream.dataset. There are two places where this is mentionned:

Checklist


apmmachine commented 1 year ago

:green_heart: Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

#### Build stats * Start Time: 2023-03-31T04:52:00.028+0000 * Duration: 3 min 18 sec

felixbarny commented 1 year ago

find if we can define (or link existing) normalization: replacing with underscores seems a relevant option.

I've implemented sanitization in this PR which also links to the sources where the restrictions are coming from: https://github.com/elastic/elasticsearch/pull/76511/files#diff-932d6c78fbdb18c29397c722174ecc65925a218ff27245f017a6ff9c3fb539c8R42

in case ECS logging is used, should we make the APM agents automatically normalize the values provided by the application ?

I suppose we'll need to do sanitization at multiple places, like in the ECS loggers, and in APM Server. Not sure if we also need to sanitize in agents.

SylvainJuge commented 1 year ago

Thanks for the pointer to the related change.

The new constraint (and automatic normalization) is added to Elasticsearch, thus anything that would write to it will either:

From what I understand this limitation is mostly technical due to the fact that we need to fit the index naming patterns.

In the logs, we use this value to break-down the application logs, using a dotted-syntax. For example, we might use tomcat.log or tomcat.access values for the tomcat general log and access log respectively. In the case those would be set to tomcat/log or tomcat-access, then the normalized values still preserve the meaning with tomcat_log and tomcat_access, so while it's not 100% identical the end-user can accommodate.

:+1: on doing this normalization at APM Server and ECS loggers level. APM Agents would implicitly implement it by relying on their respective ECS loggers.

SylvainJuge commented 1 year ago

Hi @felixbarny , does this PR is still relevant to change in the agents ? Or is it something that would be better if handled directly in apm-server instead ?

felixbarny commented 1 year ago

Not quite sure what's the best place to handle this, tbh. Maybe even in the ecs logger libs.

We'll probably also need to re-think the way we currently set event.dataset and data_stream.dataset when moving away from service-specific data streams. See also