Closed jamiehynds closed 1 year ago
Pinging @elastic/security-external-integrations (Team:Security-External Integrations)
It should be possible to implement a generic parquet reader that will work on the AWS Security Lake OCSF data. AWS provides some samples at https://github.com/aws-samples/amazon-security-lake/tree/69ce801a3314c9a321532e2db32e1bbb5b8572f4/AWSLogs_OCSF_1.0.0-rc2_samples/CLOUD_TRAIL/account_change/input.
The github.com/apache/arrow/go
library provides everything needed for reading the data. The project has an example app for dumping the data. We would want something similar that takes the parquet rows and converts them into a beat.Event
s (although probably not-flattened like that tool does).
go install github.com/apache/arrow/go/v12/parquet/cmd/parquet_reader@v12.0.0-20230410185055-2fe17338e2d1
% parquet_reader --no-metadata --json ./testdata/CreateUser.test.parquet
[
{
"metadata.product.version": "1.08",
"metadata.product.name": "CloudTrail",
"metadata.product.vendor_name": "AWS",
"metadata.product.feature.name": "Management",
"metadata.uid": "7dd15a89-ae0f-4340-8e6c-d6981246c71b",
"metadata.profiles.array": "cloud",
"metadata.version": "1.0.0-rc.2",
"time": 1679072879000,
"cloud.region": "us-east-1",
"cloud.provider": "AWS",
"api.operation": "CreateUser",
"api.request.uid": "c99bf9da-e0bd-4bf7-bb32-c8be25618afc",
"api.service.name": "iam.amazonaws.com",
"actor.user.type": "AssumedRole",
"actor.user.uid": "AROA2W7SOKHDLNCOKZNPS:Admin-user",
"actor.user.uuid": "arn:aws:sts::112233445566:assumed-role/Admin/Admin-user",
"actor.user.account_uid": "112233445566",
"actor.user.credential_uid": "ASIA2W7SOKHDHBO4U2HE",
"actor.session.created_time": 1679071437000,
"actor.session.mfa": false,
"actor.session.issuer": "arn:aws:iam::112233445566:role/Admin",
"http_request.user_agent": "AWS Internal",
"src_endpoint.ip": "52.95.4.21",
"class_name": "Account Change",
"class_uid": 3001,
"category_name": "Audit Activity",
"category_uid": 3,
"severity_id": 1,
"severity": "Informational",
"status_id": 1,
"status": "Success",
"user.name": "test_user2",
"user.uid": "AIDA2W7SOKHDM47UMJRTX",
"activity_name": "Create",
"activity_id": 1,
"type_uid": 300101,
"type_name": "Account Change: Create",
"unmapped.map.key": "sessionCredentialFromConsole",
"unmapped.map.value": "true"
},
{
"unmapped.map.key": "responseElements",
"unmapped.map.value": "{\"user\":{\"path\":\"/\",\"userName\":\"test_user2\",\"userId\":\"AIDA2W7SOKHDM47UMJRTX\",\"arn\":\"arn:aws:iam::112233445566:user/test_user2\",\"createDate\":\"Mar 17, 2023 5:07:59 PM\"}}"
},
{
"unmapped.map.key": "userIdentity_sessionContext_sessionIssuer_type",
"unmapped.map.value": "Role"
},
{
"unmapped.map.key": "userIdentity_sessionContext_sessionIssuer_accountId",
"unmapped.map.value": "112233445566"
},
{
"unmapped.map.key": "requestParameters",
"unmapped.map.value": "{\"userName\":\"test_user2\"}"
},
{
"unmapped.map.key": "recipientAccountId",
"unmapped.map.value": "112233445566"
},
{
"unmapped.map.key": "readOnly",
"unmapped.map.value": "false"
},
{
"unmapped.map.key": "userIdentity_sessionContext_sessionIssuer_principalId",
"unmapped.map.value": "AROA2W7SOKHDLNCOKZNPS"
},
{
"unmapped.map.key": "eventType",
"unmapped.map.value": "AwsApiCall"
},
{
"unmapped.map.key": "managementEvent",
"unmapped.map.value": "true"
},
{
"unmapped.map.key": "userIdentity_sessionContext_sessionIssuer_arn",
"unmapped.map.value": "arn:aws:iam::112233445566:role/Admin"
},
{
"unmapped.map.key": "userIdentity_sessionContext_sessionIssuer_userName",
"unmapped.map.value": "Admin"
}
]
With the launch of Amazon Security Lake, users can ingest and store large volumes of data from both AWS services and 3rd party sources (known as providers). SIEM vendors like Elastic can become 'subscribers' by allowing our users to ingest data from the security lake for analysis in Elastic Security. Our goal is to become a subscriber via an integration on our side. The issue to track this integration is here: https://github.com/elastic/integrations/issues/5286
Security Lake stores the data in S3 buckets within Parquet formatted files, to avail of the efficient compression with Parquet provides. Filebeat's S3 input does not currently support Parquet files and we need to add support before we can start on the Security Lake integration.