The decode_cef processor used in the integration at the input performs an automatic translation of the automatically generated fields into the ECS. The content is not checked in the process.
Specifically, we noticed this with the "Data Quality Chek" in Kibana for the "event.outcome" field
The field "cef.extension.eventOutcome" is generated automatically and is filled with "None" by default if there are no parameters. As a result, values other than "success,failure,unknown" are written. In our case, this was only noticed thanks to the Data QA. This is fatal.
Example result for not present and present:
But that's just my guess.
The expected behavior of the automatism would actually be that the content is also checked for quality when the fields are created.
This should be urgently revised, as it impairs reliability. In our case, it was a time-consuming process to find out how these incorrectly populated fields come about.
Apart from that, perhaps the entire procedure is no longer smart.
In any case, it would also be desirable to respect the field types and field values
As i have seen, you have already been involved with a commit in the decoder and also here. That's why I'm addressing you @efd6 here.
Perhaps the issue would be better placed in the beats?
The decode_cef processor used in the integration at the input performs an automatic translation of the automatically generated fields into the ECS. The content is not checked in the process. Specifically, we noticed this with the "Data Quality Chek" in Kibana for the "event.outcome" field The field "cef.extension.eventOutcome" is generated automatically and is filled with "None" by default if there are no parameters. As a result, values other than "success,failure,unknown" are written. In our case, this was only noticed thanks to the Data QA. This is fatal. Example result for not present and present:
Complains in the DQA check:
I think the error here is in the decoder that is used in all inputs. See here as example: https://github.com/elastic/integrations/blob/87e6e91ff250ade3d36636822a2d1329682f7f04/packages/cef/data_stream/log/agent/stream/udp.yml.hbs#L19
I understand that this decoder comes from the filebeat which performs the following processing: https://github.com/elastic/beats/blob/f2e2a4b1ddbb2a330280b23505c9551cc0447eba/x-pack/filebeat/processors/decode_cef/keys.ecs.go
https://github.com/elastic/beats/blob/f2e2a4b1ddbb2a330280b23505c9551cc0447eba/x-pack/filebeat/processors/decode_cef/keys.ecs.go#L95
But that's just my guess. The expected behavior of the automatism would actually be that the content is also checked for quality when the fields are created. This should be urgently revised, as it impairs reliability. In our case, it was a time-consuming process to find out how these incorrectly populated fields come about.
Apart from that, perhaps the entire procedure is no longer smart.
In any case, it would also be desirable to respect the field types and field values
As i have seen, you have already been involved with a commit in the decoder and also here. That's why I'm addressing you @efd6 here. Perhaps the issue would be better placed in the beats?