Graylog2 / graylog-plugin-aws

Several bundled Graylog plugins to integrate with different AWS services like CloudTrail and FlowLogs.
Other
91 stars 38 forks source link

Processing logs direct from Kinesis (NOT via cloudwatch logs)? #120

Open max-rocket-internet opened 5 years ago

max-rocket-internet commented 5 years ago

We are using Graylog 2.5.1+34194da and want to skip cloudwatch logs and send our logs directly to Kinesis using aws-fluent-plugin-kinesis.

We did a quick test but saw some errors related to GZIP. This same issue was mentioned at the end of https://github.com/Graylog2/graylog-plugin-aws/issues/86

Should this work?

danotorrey commented 5 years ago

@max-rocket-internet Thanks for the details. The AWS Logs and AWS Flow Logs inputs in Graylog were designed to work directly with CloudWatch, so there is some hard-coded processing (GZIP and CloudWatch JSON object decoding). These are most likely causing the errors you are seeing.

Can you help me understand what the payload (log messages) look like that are being written to the Kinesis stream? We are planning new AWS development, and I would like to make sure we consider if we can support this use case. I can see how it would be useful for Graylog to support the ability to subscribe to and read a user-defined payload from a Kinesis stream.

Can you please also help us understand the reason for skipping CloudWatch all together? Any info we can gather that will help us understand how users use Graylog and AWS will definitely help us with planning our development efforts.

Looping in @kroepke for reference.

max-rocket-internet commented 5 years ago

Hey @danotorrey Thanks for the reply.

The AWS Logs and AWS Flow Logs inputs in Graylog were designed to work directly with CloudWatch, so there is some hard-coded processing (GZIP and CloudWatch JSON object decoding). These are most likely causing the errors you are seeing.

Ah, makes sense.

Can you help me understand what the payload (log messages) look like that are being written to the Kinesis stream?

Sure. It's quite simple, it's just aws-fluent-plugin-kinesis running, docker image is fluent/fluentd-kubernetes-daemonset:v1.3.3-debian-kinesis-1.3 made by fluentd. It simply runs as a daemonset on our k8s nodes and collects logs. Nothing fancy, it's a pretty standard logging daemonset. Same as any other fluentd or fluent-bit setup but with a different output.

Can you please also help us understand the reason for skipping CloudWatch all together?

Also quite simple: Why would we want our logs in cloudwatch logs at all? It's just a stepping stone before Kinesis and then Graylog. Our log volume per day is about 1TB and currently Cloudwatch Logs costs us about $18k/month, so that's also a pretty big motivator 😅

srlucken commented 5 years ago

Hello @max-rocket-internet - Have you found a work around for this? I have many Kinesis streams I want to integrate with Graylog however I'm getting the same "Not in GZIP format" error message you mentioned.

max-rocket-internet commented 5 years ago

@srlucken

Nope. I don't think a work around is possible, it's simply not supported. We've moved to Datadog now.

kroepke commented 5 years ago

Hello @max-rocket-internet - Have you found a work around for this? I have many Kinesis streams I want to integrate with Graylog however I'm getting the same "Not in GZIP format" error message you mentioned.

@srlucken Which formats would you expect to the sending to Graylog this way? The OP was using the fluent plugin, which apparently has a fixed proto2 transport encoding, so that would need to be implemented directly anyway, but other formats might be easier to support as we improve our AWS Kinesis support.

Thanks!

srlucken commented 5 years ago

Hello @kroepke - Currently the main format we're sending to Graylog is JSON. In regards to improved AWS Kinesis support, does Graylog have anything currently in development or on a roadmap that might meet this need in the near future?

danotorrey commented 5 years ago

@srlucken Direct Kinesis support for arbitrary/custom log formats is definitely on our radar and will likely be supported in a future release. We are still working out the details for how to handle the various log formats that might be supplied.

Are you writing a distinct JSON document within the data payload for each Kinesis record? The current thinking is that we could directly extract the payload and convert it to the string and either directly parse the JSON and extract distinct fields, or provide some other parsing means. Can you provide a sample of what your JSON payload looks like? This will help us as we continue to investigate this.

srlucken commented 5 years ago

@danotorrey Thank you for the response. Here is a sample JSON payload.

{ "type": "type", "auth": { "client_token": "clientToken", "accessor": "accessor", "display_name": "displayName", "policies": [ "policy", "policy" ], "token_policies": [ "token", "policy" ], "metadata": { "role_name": "roleName" }, "entity_id": "entityId", "token_type": "tokenType" }, "request": { "id": "id", "operation": "operation", "client_token": "token:token", "client_token_accessor": "token:token", "namespace": { "id": "id", "path": "path/" }, "path": "path/path", "data": null, "policy_override": false, "remote_address": "1.1.1.1", "wrap_ttl": 0, "headers": {} }, "error": "" }