Open SylvainJuge opened 1 year ago
find if we can define (or link existing) normalization: replacing with underscores seems a relevant option.
I've implemented sanitization in this PR which also links to the sources where the restrictions are coming from: https://github.com/elastic/elasticsearch/pull/76511/files#diff-932d6c78fbdb18c29397c722174ecc65925a218ff27245f017a6ff9c3fb539c8R42
in case ECS logging is used, should we make the APM agents automatically normalize the values provided by the application ?
I suppose we'll need to do sanitization at multiple places, like in the ECS loggers, and in APM Server. Not sure if we also need to sanitize in agents.
Thanks for the pointer to the related change.
The new constraint (and automatic normalization) is added to Elasticsearch, thus anything that would write to it will either:
From what I understand this limitation is mostly technical due to the fact that we need to fit the index naming patterns.
In the logs, we use this value to break-down the application logs, using a dotted-syntax.
For example, we might use tomcat.log
or tomcat.access
values for the tomcat general log and access log respectively.
In the case those would be set to tomcat/log
or tomcat-access
, then the normalized values still preserve the meaning with tomcat_log
and tomcat_access
, so while it's not 100% identical the end-user can accommodate.
:+1: on doing this normalization at APM Server and ECS loggers level. APM Agents would implicitly implement it by relying on their respective ECS loggers.
Hi @felixbarny , does this PR is still relevant to change in the agents ? Or is it something that would be better if handled directly in apm-server instead ?
Not quite sure what's the best place to handle this, tbh. Maybe even in the ecs logger libs.
We'll probably also need to re-think the way we currently set event.dataset
and data_stream.dataset
when moving away from service-specific data streams. See also
The specification for logging currently allows to have values in
event.dataset
that do not fit thedata_stream.dataset
. There are two places where this is mentionned:${service.name}.apm-agent
value.Checklist
service.name
to ensure those constraints ?CODEOWNERS
)/
schedule YYYY-MM-DD
to the PR description.sanitize_field_names
)CODEOWNERS
)/
schedule YYYY-MM-DD
to the PR description.