Graylog2 / graylog-plugin-aws

Several bundled Graylog plugins to integrate with different AWS services like CloudTrail and FlowLogs.
Other
91 stars 37 forks source link

KinesisConsumer application name not unique across multiple instances of graylog #97

Open jplewes opened 5 years ago

jplewes commented 5 years ago

There is a problem when attempting to use multiple separate instances of graylog configured to communicate with the same Kinesis stream. The generated application name for the Kinesis client ends up being identical across instances ("graylog-plugin-aws-#stream-name#").

This may work for multiple nodes/workers on the same graylog cluster as its not desired to receive duplicate messages. In my circumstance I have 2 separate graylog instances (customer infra & managed support team infra) and both need to receive a copy of each Kinesis message.

A solution is to build a unique application name based on the graylog clusterId (or node-id if clusterid is null)

As a test I augmented KinesisConsumer.java to accept the clusterId as string in the run() method and augmented the application name with this or use the nodeid if clusterId is null.

        String instanceId = null;
        if (clusterId == null) {
            instanceId = this.nodeId.toString();
        } else {
            instanceId = clusterId.toLowerCase().substring(0,8);
        }
        final String applicationName = String.format(Locale.ENGLISH, "gap-%s-%s", instanceId, kinesisStreamName);
        KinesisClientLibConfiguration config = new KinesisClientLibConfiguration(applicationName, kinesisStreamName,
                                                                                 authProvider, workerId);

KinesisTransport.java also requried a change to pass in the clusterId value that is now expected..

Previously the dynamoDB in AWS would not even show another table.. now there is a new table for the 2nd graylog instance.

Perhaps there is a more elegant way to fix this?

danotorrey commented 5 years ago

@jplewes Thanks for pointing this out. This makes sense and seems totally reasonable to do.

The only challenge I can see is backwards compatibility (since the DynamoDB table is keyed off this name, we will want to make sure that existing setups keep using the old name so they can continue processing data after a Graylog upgrade). I'll investigate the options for this and check back in.

danotorrey commented 5 years ago

@jplewes To address this, we would like to add a new AWS Logs/AWS Flow Logs input configuration field named Kinesis Application Name. The field will be blank by default and also optional. If a value is filled in, then the applicationName supplied to the AWS Kinesis library will be set to the specified value.

This solution allows for backwards compatibility for existing installs (prevents KinesisConsumer from losing it's progress for existing installs during an upgrade due to applicationName change), and also the flexibility for you to specify a custom Kinesis application name for each of your clusters. This would be a completely arbitrary name (you could specify anything), but it would avoid the name collision in DynamoDB.

Do you think this would work for you?

jplewes commented 5 years ago

@danotorrey Yes this would work, absolutely.

danotorrey commented 5 years ago

Hi @jplewes, Another option we are considering for future AWS development is to remove the need for a Kinesis stream when reading CloudWatch and Flow Log messages from AWS. The goal would be to read the messages directly from CloudWatch. The main idea is to reduce complexity and cost. What do you think about this? Would this present any issues for your use case?

jplewes commented 5 years ago

Hi @danotorrey

I think this could work fine as well.. as long as multiple graylog instances can read from the same cloudwatch source, and multiple cloudwatch sources can be configured in graylog?

sean-abbott commented 5 years ago

Also, this does't clean up the dynamo db when you delete the input. Which if you're using terraform to test that things can be deleted and recreated is confusing. :-)

pmvilaca commented 5 years ago

+1

I have exactly the same use-case that was described by @jplewes

Are you planing to implement a fix for this in the near future?

@jplewes - How did you fix this? Forked the repo, did the change, build a custom version of the plugin and installed on both graylog clusters?

jplewes commented 5 years ago

@pmvilaca Yes I simply augmented the plugin code based on my initial post, recompiled the plugin and added the jar file in both my graylog installations.