Graylog2 / graylog2-server

Free and open log management
https://www.graylog.org
Other
7.41k stars 1.07k forks source link

Drain messages API call #2982

Open wrsuarez opened 8 years ago

wrsuarez commented 8 years ago

Expected Behavior

Requesting the ability to drain messages from a graylog server.

Current Behavior

Currently there are 3 options to effect change on a Graylog server

  1. Pause Processing which seems to = stop do anything right now but keep the messages already in the journal
  2. Mark as dead to the load balancer which seems to = Make the load balancer think you are offline although this only seems to have any effect on the web UI. We use a load balancer in front of our Graylog Servers and the health check is pinging the API page for ELB status. Despite it being marked as down on the load balancer I still see messages getting appended to the journal. This possibly due to the fact that outbound REST calls for messages (like to AWS) don't get affected by the ELB taking the machine out of the pool and/or messages are still managing to get in from some other path.
  3. Graceful shutdown which seems to = stop the graylog process but don't drain the messages off to another host.

    Possible Solution

Implement code to disable all inputs from the machines that has been put into a "drain" state but continue to allow for API calls to be made and for the journal to finish processing any messages in the backlog.

Context

The primary reason for this request is Auto Scaling in AWS. We are running 3 persistent instances of Graylog however there are times when we need to be able to scale up our ingest servers to handle a surge in message load. We intend to use Auto Scaling groups in AWS for this however as the load drops and the instances are no longer needed we need a graceful way to drain the machine during the termination process (using lifecycle hooks and a Lambda based API call to Graylog). The same Lambda function would first "drain" the node, then make regular API calls to the graylog API to confirm the journal backlog is zero before allowing the lifecycle event to complete ensuring we don't lose any messages when scaling down the cluster. We already have custom metrics in Cloudwatch to monitor the journal backlog across all machines which we would use to trigger a scale up event.

Your Environment

jalogisch commented 8 years ago

Currently, you only can:

But this is just a workaround your request is a feature request to easy this up.

wrsuarez commented 8 years ago

As noted in the request just removing it from the load balancer is not enough. Any inputs which make outbound API calls to get data (such as the AWS plugin) don't rely on the load balancer to get messages. If they are run as global inputs this drain request would need to remove the node from the load balancer AND disable all inputs on the node to ensure outbound requested inputs are turned off as well.

On Oct 31, 2016 3:08 AM, Jan Doberstein notifications@github.com wrote:

currently, you only can:

But this is just a workaround your request is a feature request to easy this up.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/Graylog2/graylog2-server/issues/2982#issuecomment-257256358, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFKDLch8DMNU9G7Twc3jLVbb83K4QeUwks5q5b4RgaJpZM4KfXSx.

fshabir commented 5 years ago

Has there been any progress on this yet? I am trying to gracefully replace one of the nodes in our graylog cluster but there seems to be no option available at the moment.