[Sublime Security]: Duplicate Data Ingestion when using API

Integration Name

Sublime Security [sublime_security]

Dataset Name

logs-sublime_security.audit-default

Integration Version

1.0.0

Agent Version

8.15.2

Agent Output Type

elasticsearch

Elasticsearch Version

8.15.2

OS Version and Architecture

GCP Kubernetes Engine (Standalone agent)

Software/API Version

No response

Error Message

No response

Event Original

No response

What did you do?

Analyzing the agent config, we could not understand how the agent stores the state to avoid data duplication. Looking at the ingest pipeline, it was noticed that the _id of the document is computed at runtime.

We triggered a roll over task to the datastream using POST logs-sublime_security.audit-default/_rollover

Rollover response:

{
  "acknowledged": true,
  "shards_acknowledged": true,
  "old_index": ".ds-logs-sublime_security.audit-default-2024.10.04-000001",
  "new_index": ".ds-logs-sublime_security.audit-default-2024.10.07-000002",
  "rolled_over": true,
  "dry_run": false,
  "lazy": false,
  "conditions": {}
}

After the datastream was rolled-over, we restated the agent and checked the data.

Now it can be seen that the same documents ingested in the previous index, were re-ingested in the new index

What did you see?

Data ingested multiple times after an index roll-over.

What did you expect to see?

Agent would keep the state of the data already ingested and would not re-ingest all the data.

Anything else?

Policy Configuration:

elastic / integrations