Open mikalai-t opened 4 years ago
@waab76 Can you take a look at this? Thank you!
As discussed on the AWS forums here, for every Kinesis consumer application, AWS maintains a DynamoDB table with the application configuration. The DynamoDB table name is the same as the Kinesis consumer application name. When you create an AWS Kinesis/Cloudwatch input in Graylog, it uses "graylog-aws-plugin-
@mikalai-t Could you provide screenshots showing your input configuration (from the Graylog > System/Inputs > Inputs page)? Also, could you check for the appropriate DynamoDB tables being created in us-east-1 and us-east-2?
Hello. Sadly, but I can't provide a screenshot because one of the projects was suspended and I removed Graylog installation in order to reduce AWS monthly cost. I checked DynamoDB tables in both regions and they looked correct (hidden parts contain different project names).
in eu-west-2
:
in us-east-1
:
If this information is not enough feel free to close the issue. I will re-open next time I face it.
Thank you anyway!
The error initially reported typically happens when someone creates a new Kinesis consumer application that has the same name as an existing (or old) Kinesis consumer app and thus the new Kinesis consumer ends up trying to re-use the DynamoDB table that was tracking status for the old Kinesis consumer. Currently, we name the Kinesis consumer app based on what the user named their Input. I don't think we can stop users from re-using an old input name (and thus running into this issue), so I think we have two possible fixes to ensure users don't run into this issue:
1) When a user removes a Kinesis input, we attempt to clean up (remove) the corresponding DynamoDB table. This may not work because the user may not have given us AWS credentials that allow us to delete Dynamo tables. Also we probably should not be doing things in the user's AWS account without getting explicit permission first.
1) We move the consumer app name generation up from the KinesisConsumer class to the Input class (so the name can be stored as with input configuration) and introduce some randomness to prevent DDB table name collisions if the user creates a new Kinesis input that has a potential name collision with an old input.
I'm going to dive deeper on the second option to see if it will work as expected.
Well, most of the resources I created with Terraform, excluding DynamoDB Table (I didn't know what definite stack is required). I think that's acceptable to remove the table used only by this plugin even from Graylog. If there is no appropriate permission provided - let the plugin catch an "Access Denied" exception, show notification and record into log, for example.
It's still not clear for me what "table name collision" are you talking about? I showed 2 different tables in different regions...
I think, I'll have to restore Graylog ECS instance to shed light on Kinesis Inputs config ))
The table name collision comes from a situation like this:
my_input
graylog-aws-plugin-my_input
(this lives inside of Graylog)graylog-aws-plugin-my_input
to help manage the Kinesis consumergraylog-aws-plugin-my_input
DDB table to keep track of checkpointing and such on the Kinesis streamgraylog-aws-plugin-my_input
graylog-aws-plugin-my_input
DDB table intact, just in case the consumer wants to resume processing latermy_input
but pointing to a different Kinesis stream
graylog-aws-plugin-my_input
graylog-aws-plugin-my_input
DDB table, which has checkpointing info from the old stream resulting in the error reported aboveYou're right. Indeed I re-created second (non working) input several times because there weren't new messages in Graylog and I thought I did something wrong creating the input. So, today I removed both DynamoDB tables then I restored Graylog ECS application, configured 2 new Kinesis inputs and allowed Graylog to create new DDB tables. Everything is working now!
Thank you much for your time and clarification!
Reopening so the team can implement better handling of this error case.
The error initially reported typically happens when someone creates a new Kinesis consumer application that has the same name as an existing (or old) Kinesis consumer app and thus the new Kinesis consumer ends up trying to re-use the DynamoDB table that was tracking status for the old Kinesis consumer. Currently, we name the Kinesis consumer app based on what the user named their Input. I don't think we can stop users from re-using an old input name (and thus running into this issue), so I think we have two possible fixes to ensure users don't run into this issue:
- When a user removes a Kinesis input, we attempt to clean up (remove) the corresponding DynamoDB table. This may not work because the user may not have given us AWS credentials that allow us to delete Dynamo tables. Also we probably should not be doing things in the user's AWS account without getting explicit permission first.
- We move the consumer app name generation up from the KinesisConsumer class to the Input class (so the name can be stored as with input configuration) and introduce some randomness to prevent DDB table name collisions if the user creates a new Kinesis input that has a potential name collision with an old input.
I'm going to dive deeper on the second option to see if it will work as expected.
Is option 2 still in the works?
We have a different use-case for this change. We would like to have two different Graylog instances reading from the same Kinesis data stream in AWS. We tried setting this up but found that the 2nd Graylog instance was not pulling data from Kinesis. I think it is due to the fact that there is a single DDB table associated with the Kinesis stream? The first instance is pulling the data from Kinesis and then updating the DDB table. The 2nd Graylog instance never thinks there is any new data to pull. Is that the case?
Having a unique DDB table name per Graylog instance would solve this. Option 2 would seem to be the solution.
Any update on this item? I have the same issue, only the first input is collecting the logs and the node is always overloaded and sometimes getting killed due to a lack of memory
Graylog Version: Graylog 5.0.3
Hello.
Description
The first input was configured to fetch logs from the Kinesis stream in the
us-east-1
AWS region and working now. Then I was able to configure the second input and even received a message to pass the configuration check. But after that no more messages have been received via this input. Second input configuration is the same as the first one except it's located in anothereu-west-2
region .Steps To Reproduce
In the region
us-east-1
:Still in the region
us-east-1
:In the region
eu-west-2
:Login into Graylog and configure 2
AWS Kinesis/CloudWatch
inputs from both aforementioned Kinesis streams.Only the first input will be working. The second throws exception:
Environment
Graylog Version:
3.2.2+2f9a628, codename Ethereal Elk
JVM:
PID 19, Oracle Corporation 1.8.0_242 on Linux 4.14.165-133.209.amzn2.x86_64
Installed plugins:
3.2.2
3.2.2
3.2.2
3.2.2
Elasticsearch Version:
[LTm5Sek] version[6.8.5], pid[1], build[oss/docker/78990e9/2019-11-13T20:04:24.100411Z], OS[Linux/4.14.165-133.209.amzn2.x86_64/amd64], JVM[AdoptOpenJDK/OpenJDK 64-Bit Server VM/13.0.1/13.0.1+9]
MongoDB Version:
3.6.17
Browser Version:
Google Chrome | 79.0.3945.88 (Official Build) (64-bit)
Thank you!