Analyzing the agent config, we could not understand how the agent stores the state to avoid data duplication. Looking at the ingest pipeline, it was noticed that the _id of the document is computed at runtime.
We triggered a roll over task to the datastream using POST logs-sublime_security.audit-default/_rollover
Integration Name
Sublime Security [sublime_security]
Dataset Name
logs-sublime_security.audit-default
Integration Version
1.0.0
Agent Version
8.15.2
Agent Output Type
elasticsearch
Elasticsearch Version
8.15.2
OS Version and Architecture
GCP Kubernetes Engine (Standalone agent)
Software/API Version
No response
Error Message
No response
Event Original
No response
What did you do?
Analyzing the agent config, we could not understand how the agent stores the state to avoid data duplication. Looking at the ingest pipeline, it was noticed that the
_id
of the document is computed at runtime.We triggered a roll over task to the datastream using
POST logs-sublime_security.audit-default/_rollover
Rollover response:
After the datastream was rolled-over, we restated the agent and checked the data.
Now it can be seen that the same documents ingested in the previous index, were re-ingested in the new index
What did you see?
Data ingested multiple times after an index roll-over.
What did you expect to see?
Agent would keep the state of the data already ingested and would not re-ingest all the data.
Anything else?
Policy Configuration: