Open wrsuarez opened 8 years ago
Currently, you only can:
But this is just a workaround your request is a feature request to easy this up.
As noted in the request just removing it from the load balancer is not enough. Any inputs which make outbound API calls to get data (such as the AWS plugin) don't rely on the load balancer to get messages. If they are run as global inputs this drain request would need to remove the node from the load balancer AND disable all inputs on the node to ensure outbound requested inputs are turned off as well.
On Oct 31, 2016 3:08 AM, Jan Doberstein notifications@github.com wrote:
currently, you only can:
But this is just a workaround your request is a feature request to easy this up.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/Graylog2/graylog2-server/issues/2982#issuecomment-257256358, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFKDLch8DMNU9G7Twc3jLVbb83K4QeUwks5q5b4RgaJpZM4KfXSx.
Has there been any progress on this yet? I am trying to gracefully replace one of the nodes in our graylog cluster but there seems to be no option available at the moment.
Expected Behavior
Requesting the ability to drain messages from a graylog server.
Current Behavior
Currently there are 3 options to effect change on a Graylog server
Possible Solution
Implement code to disable all inputs from the machines that has been put into a "drain" state but continue to allow for API calls to be made and for the journal to finish processing any messages in the backlog.
Context
The primary reason for this request is Auto Scaling in AWS. We are running 3 persistent instances of Graylog however there are times when we need to be able to scale up our ingest servers to handle a surge in message load. We intend to use Auto Scaling groups in AWS for this however as the load drops and the instances are no longer needed we need a graceful way to drain the machine during the termination process (using lifecycle hooks and a Lambda based API call to Graylog). The same Lambda function would first "drain" the node, then make regular API calls to the graylog API to confirm the journal backlog is zero before allowing the lifecycle event to complete ensuring we don't lose any messages when scaling down the cluster. We already have custom metrics in Cloudwatch to monitor the journal backlog across all machines which we would use to trigger a scale up event.
Your Environment