Closed gsantoro closed 1 year ago
I want to take this as a chance to take a step back. Why do we need event.ingested
in the first place? My understanding is that it is useful for monitoring (ingestion delay) and for security purpose. On the second point, I assume @andrewkroh can add a few more bits on how it is used.
Both of the above are not package related but "centralised" concers. Also the addition of the data can only happen centrally. Based on this, I agree with not adding it to every package and have this as part of our docs.
As it is a centralised concern, who should be in control of it? Is it a global setting, is it per package or per data stream? One thing to consider is that timestamps in general have quite a bit of overhead as these cannot be compressed well (@jpountz might have here some more details).
Few questions that come to mind:
event.ingested
?event.ingested
have to be indexed or is stored
enough?@gsantoro If nobody objects, it seems we can move forward with your suggestions but I would like to keep this open to have a follow up discussions on the long term of event.ingested
.
I also am curious as to what we are using event.ingested for? It does seem useful for diagnostic reasons; it is interesting when there is a large lag between event creation and delivery. This may affect the efficacy of the SIEM as it is time window bound. May also demonstrate a clock skew issue.
However, does it necessarily have to be mapped? If it is not mapped, then the compression overhead concerns go away. We do still have pipeline overhead concerns.
@gsantoro Is there any follow up issue or PR for your action points to track the progress?
@joshdover Pinging you on this one for awareness as event.ingested
seems to be a more generic ingest challenge together with the final pipeline. @axw Is apm using event.ingested
in any way?
Hello @ruflin , regarding the PR that created this discussion in the first place, it has now been merged as it was (without the event.ingested) as agreed.
I assume that if we agree to remove event.ingested from all the ingest pipeline, this would not happen in a single PR but would probably involve multiple teams according to the CODEOWNERS responsibilities.
I also am curious as to what we are using event.ingested for?
These are the use cases that I am aware of.
event.ingested
to deal with documents that are delayed ^3.If it is not mapped, then the compression overhead concerns go away.
It is mapped as a date
. The value does not contain subseconds to help improve compressibility. ^1
@axw Is apm using event.ingested in any way?
No. I have used it in the past to calculate ingestion delay, but that was a one-off exercise, not something that is visualised in the product.
The way I read the above comments, there are specific features which need event.ingested
. But these are not features specific to integrations but in general to data ingestion. We can currently enable on data coming from Elastic Agent that uses integrations (my understanding is final pipeline is only applied for integrations) but if used, it would be ideal that it is applied on all data (at least all data coming in for the data stream naming scheme).
event.ingested
is a feature that a user or a feature should be able to run on / off based on user needs. It can be turned on globally for evasion detection rules
and more locally on some data streams for detecting ingest lag. This sounds a lot like an "ingest/fleet" feature @amitkanfer .
My current take is, we need to continue the discussion around this feature but with a more holistic view as part of ingest and is not tied to any of the integrations. Because of this, I'm closing this issue but I think we need a separate issue for further discussion.
@gsantoro @tommyers-elastic Can we ensure that the decision not adding event.ingested
is documented? https://github.com/elastic/integrations/issues/4462#issuecomment-1298597103 @jsoriano Should we even have a check?
Should we even have a check?
A check to disallow adding event.ingested
from integrations? This could be an option, yes.
@jsoriano Yes. Or in a way that someone could still overwrite it but has to set a flag. But we can do that if it is actually needed.
As part of a PR it has been brought to my attention that we shouldn't add event.ingested to ingest pipelines since that field is already added by a default fleet pipeline.
According to @andrewkroh here and here, we shouldn't add the field
event.ingested
in individual integrations since this fleet pipeline runs both in managed and standalone mode.I would like to use this issue to agree on this topic and create some action points like: