Closed efd6 closed 12 months ago
Pinging @elastic/security-external-integrations (Team:Security-External Integrations)
Example change for the audit datastream:
diff --git a/packages/gcp/data_stream/audit/agent/stream/gcp-pubsub.yml.hbs b/packages/gcp/data_stream/audit/agent/stream/gcp-pubsub.yml.hbs
index 43af08afa..57a8784f9 100644
--- a/packages/gcp/data_stream/audit/agent/stream/gcp-pubsub.yml.hbs
+++ b/packages/gcp/data_stream/audit/agent/stream/gcp-pubsub.yml.hbs
@@ -27,7 +27,11 @@ tags:
{{#contains "forwarded" tags}}
publisher_pipeline.disable_host: true
{{/contains}}
-{{#if processors}}
processors:
+- add_fields:
+ target: '_conf'
+ fields:
+ keep_json: {{keep_json}}
+{{#if processors}}
{{processors}}
{{/if}}
diff --git a/packages/gcp/data_stream/audit/elasticsearch/ingest_pipeline/default.yml b/packages/gcp/data_stream/audit/elasticsearch/ingest_pipeline/default.yml
index 5a78745ec..fd7316588 100644
--- a/packages/gcp/data_stream/audit/elasticsearch/ingest_pipeline/default.yml
+++ b/packages/gcp/data_stream/audit/elasticsearch/ingest_pipeline/default.yml
@@ -363,8 +363,13 @@ processors:
##
# clean-up
##
+ - rename:
+ field: json
+ target_field: gcp.audit.flattened
+ if: ctx.json != null && ctx._conf?.keep_json == true
- remove:
field:
+ - _conf
- _temp
- json
ignore_missing: true
diff --git a/packages/gcp/data_stream/audit/fields/fields.yml b/packages/gcp/data_stream/audit/fields/fields.yml
index 027cc591b..d0e78e65d 100644
--- a/packages/gcp/data_stream/audit/fields/fields.yml
+++ b/packages/gcp/data_stream/audit/fields/fields.yml
@@ -113,3 +113,6 @@
- name: message
type: keyword
description: "A developer-facing error message, which should be in English. Any user-facing error message should be localized and sent in the google.rpc.Status.details field, or localized by the client."
+ - name: flattened
+ type: flattened
+ description: Contains the full audit document as sent by GCP.
\ No newline at end of file
diff --git a/packages/gcp/data_stream/audit/manifest.yml b/packages/gcp/data_stream/audit/manifest.yml
index 130daabdc..7ec236667 100644
--- a/packages/gcp/data_stream/audit/manifest.yml
+++ b/packages/gcp/data_stream/audit/manifest.yml
@@ -65,6 +65,14 @@ streams:
type: bool
multi: false
default: false
+ - name: keep_json
+ required: true
+ show_user: false
+ title: Keep the JSON document as `gcp.audit.flattened`
+ description: Keeps a copy of the original document as a JSON field for processing in `@custom` pipelines.
+ type: bool
+ multi: false
+ default: false
- name: processors
type: yaml
title: Processors
diff --git a/packages/gcp/docs/README.md b/packages/gcp/docs/README.md
index 577421df0..9c1b16884 100644
--- a/packages/gcp/docs/README.md
+++ b/packages/gcp/docs/README.md
@@ -260,6 +260,7 @@ The `audit` dataset collects audit logs of administrative activities and accesse
| gcp.audit.authorization_info.resource_attributes.name | The name of the resource. | keyword |
| gcp.audit.authorization_info.resource_attributes.service | The name of the service. | keyword |
| gcp.audit.authorization_info.resource_attributes.type | The type of the resource. | keyword |
+| gcp.audit.flattened | Contains the full audit document as sent by GCP. | flattened |
| gcp.audit.labels | A map of key, value pairs that provides additional information about the log entry. The labels can be user-defined or system-defined. | flattened |
| gcp.audit.logentry_operation.first | Optional. Set this to True if this is the first log entry in the operation. | boolean |
| gcp.audit.logentry_operation.id | Optional. An arbitrary operation identifier. Log entries with the same identifier are assumed to be part of the same operation. | keyword |
diff --git a/packages/gcp/docs/audit.md b/packages/gcp/docs/audit.md
index 09038d517..d587ad23e 100644
--- a/packages/gcp/docs/audit.md
+++ b/packages/gcp/docs/audit.md
@@ -49,6 +49,7 @@ The `audit` dataset collects audit logs of administrative activities and accesse
| gcp.audit.authorization_info.resource_attributes.name | The name of the resource. | keyword |
| gcp.audit.authorization_info.resource_attributes.service | The name of the service. | keyword |
| gcp.audit.authorization_info.resource_attributes.type | The type of the resource. | keyword |
+| gcp.audit.flattened | Contains the full audit document as sent by GCP. | flattened |
| gcp.audit.labels | A map of key, value pairs that provides additional information about the log entry. The labels can be user-defined or system-defined. | flattened |
| gcp.audit.logentry_operation.first | Optional. Set this to True if this is the first log entry in the operation. | boolean |
| gcp.audit.logentry_operation.id | Optional. An arbitrary operation identifier. Log entries with the same identifier are assumed to be part of the same operation. | keyword |
(note to self — a local branch with this change exists)
We have had customer requests to retain additonal fields in the GCP integration so that fields that we are currently removing from documents are available for them to use in detection rules. In particular this is in the audit datastream, but could be relevant to others.
I propose that we
rename
thejson
temporary field togcp.<datastream>.flattened
(or similar) which would be mapped either as atype: flattened
orindex: false
. We would also add a configuration option to the datastream UI that sets a defaultfalse
flag,keep_json
. In the ingest pipeline this flag would be used to conditionallyremove
thegcp.<datastream>.flattened
field if nottrue
.For current users, this would have no impact as the field would by default no be in their ingested documents, but for users wishing to use fields that we have otherwise dropped, they can set the option to
true
and in their@custom
pipeline (and associated mapping definition) they can extract and process the fields that they are interested in and then optionallyremove
thegcp.<datastream>.flattened
field if they do not need it further.Note that this functionality can be achieved currently with additional work by adding an
@custom
pipeline that would{"json":{"field":"event.original","target_field":"_tmp_json","if":"ctx.event?.original != null"}}
, doing the additional processing and then deleting_tmp_json
(or giving it a more durable name and adding it appropriately to the mappings).