elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
103 stars 4.92k forks source link

[Filebeat S3 Input] Add support for Apache Parquet files #34662

Closed jamiehynds closed 1 year ago

jamiehynds commented 1 year ago

With the launch of Amazon Security Lake, users can ingest and store large volumes of data from both AWS services and 3rd party sources (known as providers). SIEM vendors like Elastic can become 'subscribers' by allowing our users to ingest data from the security lake for analysis in Elastic Security. Our goal is to become a subscriber via an integration on our side. The issue to track this integration is here: https://github.com/elastic/integrations/issues/5286

Security Lake stores the data in S3 buckets within Parquet formatted files, to avail of the efficient compression with Parquet provides. Filebeat's S3 input does not currently support Parquet files and we need to add support before we can start on the Security Lake integration.

elasticmachine commented 1 year ago

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

andrewkroh commented 1 year ago

It should be possible to implement a generic parquet reader that will work on the AWS Security Lake OCSF data. AWS provides some samples at https://github.com/aws-samples/amazon-security-lake/tree/69ce801a3314c9a321532e2db32e1bbb5b8572f4/AWSLogs_OCSF_1.0.0-rc2_samples/CLOUD_TRAIL/account_change/input.

The github.com/apache/arrow/go library provides everything needed for reading the data. The project has an example app for dumping the data. We would want something similar that takes the parquet rows and converts them into a beat.Events (although probably not-flattened like that tool does).

go install github.com/apache/arrow/go/v12/parquet/cmd/parquet_reader@v12.0.0-20230410185055-2fe17338e2d1

% parquet_reader --no-metadata --json ./testdata/CreateUser.test.parquet
[
  {
    "metadata.product.version": "1.08",
    "metadata.product.name": "CloudTrail",
    "metadata.product.vendor_name": "AWS",
    "metadata.product.feature.name": "Management",
    "metadata.uid": "7dd15a89-ae0f-4340-8e6c-d6981246c71b",
    "metadata.profiles.array": "cloud",
    "metadata.version": "1.0.0-rc.2",
    "time": 1679072879000,
    "cloud.region": "us-east-1",
    "cloud.provider": "AWS",
    "api.operation": "CreateUser",
    "api.request.uid": "c99bf9da-e0bd-4bf7-bb32-c8be25618afc",
    "api.service.name": "iam.amazonaws.com",
    "actor.user.type": "AssumedRole",
    "actor.user.uid": "AROA2W7SOKHDLNCOKZNPS:Admin-user",
    "actor.user.uuid": "arn:aws:sts::112233445566:assumed-role/Admin/Admin-user",
    "actor.user.account_uid": "112233445566",
    "actor.user.credential_uid": "ASIA2W7SOKHDHBO4U2HE",
    "actor.session.created_time": 1679071437000,
    "actor.session.mfa": false,
    "actor.session.issuer": "arn:aws:iam::112233445566:role/Admin",
    "http_request.user_agent": "AWS Internal",
    "src_endpoint.ip": "52.95.4.21",
    "class_name": "Account Change",
    "class_uid": 3001,
    "category_name": "Audit Activity",
    "category_uid": 3,
    "severity_id": 1,
    "severity": "Informational",
    "status_id": 1,
    "status": "Success",
    "user.name": "test_user2",
    "user.uid": "AIDA2W7SOKHDM47UMJRTX",
    "activity_name": "Create",
    "activity_id": 1,
    "type_uid": 300101,
    "type_name": "Account Change: Create",
    "unmapped.map.key": "sessionCredentialFromConsole",
    "unmapped.map.value": "true"
  },
  {
    "unmapped.map.key": "responseElements",
    "unmapped.map.value": "{\"user\":{\"path\":\"/\",\"userName\":\"test_user2\",\"userId\":\"AIDA2W7SOKHDM47UMJRTX\",\"arn\":\"arn:aws:iam::112233445566:user/test_user2\",\"createDate\":\"Mar 17, 2023 5:07:59 PM\"}}"
  },
  {
    "unmapped.map.key": "userIdentity_sessionContext_sessionIssuer_type",
    "unmapped.map.value": "Role"
  },
  {
    "unmapped.map.key": "userIdentity_sessionContext_sessionIssuer_accountId",
    "unmapped.map.value": "112233445566"
  },
  {
    "unmapped.map.key": "requestParameters",
    "unmapped.map.value": "{\"userName\":\"test_user2\"}"
  },
  {
    "unmapped.map.key": "recipientAccountId",
    "unmapped.map.value": "112233445566"
  },
  {
    "unmapped.map.key": "readOnly",
    "unmapped.map.value": "false"
  },
  {
    "unmapped.map.key": "userIdentity_sessionContext_sessionIssuer_principalId",
    "unmapped.map.value": "AROA2W7SOKHDLNCOKZNPS"
  },
  {
    "unmapped.map.key": "eventType",
    "unmapped.map.value": "AwsApiCall"
  },
  {
    "unmapped.map.key": "managementEvent",
    "unmapped.map.value": "true"
  },
  {
    "unmapped.map.key": "userIdentity_sessionContext_sessionIssuer_arn",
    "unmapped.map.value": "arn:aws:iam::112233445566:role/Admin"
  },
  {
    "unmapped.map.key": "userIdentity_sessionContext_sessionIssuer_userName",
    "unmapped.map.value": "Admin"
  }
]