elastic / integrations

Elastic Integrations
https://www.elastic.co/integrations
Other
30 stars 447 forks source link

[Netskope] Test ingestion of compressed Netskope cloud storage logs #10744

Open cpascale43 opened 3 months ago

cpascale43 commented 3 months ago

Netskope is changing how they deliver logs/events and wants us to support ingesting these through cloud storage (S3, Azure Blob, GCP). They have indicated the logs will be in a compressed format, likely gzip. We need to test our existing S3, Azure Blob, and GCP inputs against the new format.

The key tasks are:

  1. Evaluate if our existing cloud storage input integrations can handle the new compressed log format from Netskope.
  2. If the existing inputs cannot handle the compressed format, we then need to investigate the need for new ingest pipelines specifically for these Netskope cloud storage logs.
elasticmachine commented 3 months ago

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

cpascale43 commented 3 months ago

Hey @narph - we got some log samples from them - they also offered to give us access to an S3 bucket in case we want more. Let me know if that would be helpful.

efd6 commented 2 months ago

All three inputs handle gzip compressed data, but the data being sent in the compressed container differs significantly from the data that we currently handle. The current data is in a line-based JSON stream format, while the examples provided are in a headered CSV (using space (\u20) as the comma). None of the inputs currently support CSV of any flavour; gcs assumes JSON, as does the azure blob storage, while aws has a configurable parquet decoder.

cpascale43 commented 2 months ago

Hi @efd6 thanks for taking a look. It sounds like we will need to add CSV support to our cloud storage inputs, and then add new datastreams/pipelines to the Netskope integration to support those events?

First, I am confirming that CSV is definitely the format they'll be using long-term, and they aren't planning on supporting JSON from cloud storage.

efd6 commented 2 months ago

@cpascale43 Thanks

cpascale43 commented 2 months ago

Sharing the context received from Netskope: