chef / automate

Chef Automate provides a full suite of enterprise capabilities for maintaining continuous visibility into application, infrastructure, and security automation.
https://automate.chef.io/
Apache License 2.0
225 stars 113 forks source link

Elasticsearch Bulk API with bundled messages FOR COMPLIANCE INGESTION #74

Open vjeffrey opened 5 years ago

vjeffrey commented 5 years ago

User Story

In https://github.com/chef/a2/pull/5067 the client runs ingestion pipeline was modified to use the elasticsearch bulk api with bundled messages. This is a huge improvement for the pipeline, so let's implement it in the compliance ingestion pipeline too. Please see https://github.com/chef/a2/pull/5067/files for more details.

Definition of Done

compliance ingestion uses es bulk api/bundles msgs

lancewf commented 5 years ago

The Elasticsearch Bulk API cannot be currently used for the compliance report ingestion. The reason the Bulk API cannot be used is that for each report an elasticsearch update-by-query is run to unmark the previous report from being the latest. The reason config-mgmt-service does not have this problem is that it has a separate index that only contains the latest run. Updating one document with its ID in this index can use the Bulk API. The reason update-by-query cannot use the Bulk API is, it is searching for the document that needs to be changed.

https://github.com/chef/automate/blob/master/components/compliance-service/ingest/ingestic/ingestic.go#L141