Open efd6 opened 1 year ago
In order for
error.message
to be fully open to use in its documented semantics, elastic-package would need to not rely on its state to determine whether pipeline failure has occurred
Yes, I think it would make sense to stop relying on this, and fail only if event.kind
is pipeline_error
.
if
event.kind
is used to signal pipeline failure, that pipelines all set that field correctly in the case of a pipeline failure.
Would it make sense to automatically add an on_failure
handler that adds event.kind
to all pipelines managed by fleet? This could be added in build time by elastic-package
or on install time by Fleet.
@juliaElastic @joshdover wdyt? Do you know if we have something similar now?
FTR, teams have been manually adding this on_failure
handler to many packages https://github.com/elastic/integrations/issues/6582
We have a fleet final pipeline that could easily be updated to do this for all Fleet-managed pipelines: https://github.com/elastic/kibana/blob/12a10d98559fb6fee98d81c7bbb22cf654187bde/x-pack/plugins/fleet/server/constants/fleet_es_assets.ts#L18
FTR, teams have been manually adding this
on_failure
handler to many packages elastic/integrations#6582
There are several PR's still pending approval/merge, but they should all be done once those are merged.
The only thing we may want to consider changing is that they were all initially done with a set
processor, but I discussed with @efd6 about using an append
processor instead. The reason for suggesting a change to append
is to prevent the pipeline error from removing other values from event.kind
that may affect the events' validity for Detection Rules, displaying on dashboards that are filtered by event.kind
, and potentially other situations as well.
Would it make sense to automatically add an
on_failure
handler that addsevent.kind
to all pipelines managed by fleet? This could be added in build time byelastic-package
or on install time by Fleet.
If this is being done at Fleet level, we will not need the change done in pipelines . Eg: here
Is there any timeline on this? I just ran into it (again) while working on a fix for the aws.cloudfront_log
integration.
If skipping errors based on event.kind: pipeline_error
you may add a configuration option to silence specific errors on testing.
We don't have a timeline on this, but it seems like a reasonable improvement to take on. I'm putting this in an upcoming sprint for the 8.17 cycle tentatively.
Maybe we can disable the check based on error.message
in cases where we can check if there were failures based on the failure store (https://github.com/elastic/elastic-package/issues/1832).
As a workaround I use a specific tag in the test log config and only add the error message if this tag does not exist. This is far from ideal but does the job for the moment.
IMO the best solution would still be to ignore the presence of error.message
in case of event.kind: pipeline_error
, or even better only rely on event.kind: pipeline_error
since this is the official way to indicate pipeline errors.
Side note: As of today, not all pipelines populate the event.kind field in case of errors (e.g. aws.cloudfront_logs). Thus, errors of these pipelines do not show up in the built-in ingest error search, since this search filters for both error.message and event.kind: pipeline_error. I strongly recommend to make sure all existing pipelines add pipeline_error to the error handlers where missing.
In case where adding this field is automatically added, please make sure that it is only added to the "global" pipeline on_failure object. I.e. on_failure handlers of processors itself must not lead to pipeline_errors. Use-case. Catching a processor error, do some stuff and proceed without error must still be supported.
The elastic-package test failure criteria include a check for a non-present
error.message
field to indicate that there has not been a pipeline failure during pipeline processing. This use is at odds with the documented semantics of the field.For
error
in generaland for the specific field
Together there indicate that the field should be available for use by data sources to include source-specific error states for events.
This is currently not possible because a data stream that uses
error.message
to hold a data stream-specific error state would be identified as an ingest/processing failure by elastic-package.The
event.kind
field does support marking pipeline failures with the "pipeline_error" value. So it is possible to distinguish pipeline failures from other uses oferror.message
, but this is currently brittle as the pipeline failure event kind is not automatic while settingerror.message
is.In order for
error.message
to be fully open to use in its documented semantics, elastic-package would need to not rely on its state to determine whether pipeline failure has occurred, and ifevent.kind
is used to signal pipeline failure, that pipelines all set that field correctly in the case of a pipeline failure.