elastic / integrations

Elastic Integrations
https://www.elastic.co/integrations
Other
199 stars 429 forks source link

[Sublime Security]: Duplicate Data Ingestion when using API #11363

Open christophercutajar opened 6 days ago

christophercutajar commented 6 days ago

Integration Name

Sublime Security [sublime_security]

Dataset Name

logs-sublime_security.audit-default

Integration Version

1.0.0

Agent Version

8.15.2

Agent Output Type

elasticsearch

Elasticsearch Version

8.15.2

OS Version and Architecture

GCP Kubernetes Engine (Standalone agent)

Software/API Version

No response

Error Message

No response

Event Original

No response

What did you do?

Analyzing the agent config, we could not understand how the agent stores the state to avoid data duplication. Looking at the ingest pipeline, it was noticed that the _id of the document is computed at runtime.

Image

We triggered a roll over task to the datastream using POST logs-sublime_security.audit-default/_rollover

Rollover response:

{
  "acknowledged": true,
  "shards_acknowledged": true,
  "old_index": ".ds-logs-sublime_security.audit-default-2024.10.04-000001",
  "new_index": ".ds-logs-sublime_security.audit-default-2024.10.07-000002",
  "rolled_over": true,
  "dry_run": false,
  "lazy": false,
  "conditions": {}
}

After the datastream was rolled-over, we restated the agent and checked the data.

Now it can be seen that the same documents ingested in the previous index, were re-ingested in the new index

Image

What did you see?

Data ingested multiple times after an index roll-over.

What did you expect to see?

Agent would keep the state of the data already ingested and would not re-ingest all the data.

Anything else?

Policy Configuration:

Image

elasticmachine commented 6 days ago

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)