fluent / fluentd

Fluentd: Unified Logging Layer (project under CNCF)
https://www.fluentd.org
Apache License 2.0
12.86k stars 1.34k forks source link

Update/Reload without downtime #4622

Open daipom opened 1 month ago

daipom commented 1 month ago

Is your feature request related to a problem? Please describe.

Updating Fluentd or reloading a config causes downtime. Plugins that receive data as a server, such as in_udp, in_tcp, and in_syslog, cannot receive data during this time. This means that the data sent by a client is lost during this time unless the client has a re-sending feature. This makes updating Fluentd or reloading a config difficult in some cases.

Describe the solution you'd like

Add a new feature: Update/Reload without downtime.

For example, implement a mechanism similar to nginx's feature for upgrading on the fly.

The main problem is that Fluentd can't run in parallel with the same config. (It causes some conflicts, such as buffer files)

Because of this problem, it is very difficult to support all plugins. However, it is possible to support only plugins that can run in parallel.

Based on the above, the following mechanism would be a good way to achieve this.

  1. The current supervisor receives a signal.
  2. The current supervisor sends signals to its workers, and the workers stop all plugins that cannot run in parallel.
  3. The current supervisor starts a new supervisor.
    • => Old processes and new processes run in parallel.
  4. After the new supervisor and its workers start to work, the current supervisor and its workers stop.

More specifically, it would be better to run only limited Input plugins in parallel, such as in_tcp, in_udp, and in_syslog. Stop all plugins except those Input plugins, and prepare a dedicated file buffer for Output. After the new workers start, they load the file buffer and route those events to the @ROOT label.

Describe alternatives you've considered

None.

Additional context

I have already started to create a PoC.