fluent / fluentd

Fluentd: Unified Logging Layer (project under CNCF)
https://www.fluentd.org
Apache License 2.0
12.91k stars 1.34k forks source link

Fluentd not starting if backend not reachable #2786

Open sgohl opened 4 years ago

sgohl commented 4 years ago

Perhaps this is connected to https://github.com/fluent/fluentd/issues/1280 and it seems like a general mistake in design. A failed backend connection definitely must not prevent fluentd from starting at all.

/usr/lib/ruby/2.5.0/net/http.rb:939:in `rescue in block in connect': Failed to open TCP connection to couch:5984 (getaddrinfo: Try again) (SocketError)
    from /usr/lib/ruby/2.5.0/net/http.rb:936:in `block in connect'

I am using the official Docker image fluent/fluentd:latest

Fluentd is widely used to fire-and-forget messages, often without really caring if some messages get lost. It is a complete show-stopper if it refuses to start when backends are not available.

There should be a configuration switch to silently ignore outages and just warn on start, at least.

repeatedly commented 4 years ago

There should be a configuration switch to silently ignore outages and just warn on start, at least.

This seems acceptable idea. Maybe, some plugins don't work when configure/start raises an error. But if users reviewed the code and no problem, this behaviour is useful.

sgohl commented 4 years ago

I'm actually afraid, this would result in not retrying connecting the backend ever. If the buffer directive could be enhanced to accept connection errors and a configurable interval for retrying, that would be the best solution, I think.

repeatedly commented 4 years ago

If the buffer directive could be enhanced to accept connection errors and a configurable interval for retrying,

buffer can't help this situation. buffer is used after pipeline started. fluentd stops for configure/start phase error because there are unrecoverable errors, e.g. incorrect host/port/authn/authz parameter. fluentd can't judge errors are configuration mistake or expected unreachable situation. Adding @ignore_startup_error to help latter case.