Is your feature request related to a problem? Please describe.
Updating Fluentd or reloading a config causes downtime.
Plugins that receive data as a server, such as in_udp, in_tcp, and in_syslog, cannot receive data during this time.
This means that the data sent by a client is lost during this time unless the client has a re-sending feature.
This makes updating Fluentd or reloading a config difficult in some cases.
Describe the solution you'd like
Add a new feature: Update/Reload without downtime.
For example, implement a mechanism similar to nginx's feature for upgrading on the fly.
The main problem is that Fluentd can't run in parallel with the same config.
(It causes some conflicts, such as buffer files)
Because of this problem, it is very difficult to support all plugins.
However, it is possible to support only plugins that can run in parallel.
Based on the above, the following mechanism would be a good way to achieve this.
The current supervisor receives a signal.
The current supervisor sends signals to its workers, and the workers stop all plugins that cannot run in parallel.
The current supervisor starts a new supervisor.
=> Old processes and new processes run in parallel.
After the new supervisor and its workers start to work, the current supervisor and its workers stop.
More specifically, it would be better to run only limited Input plugins in parallel, such as in_tcp, in_udp, and in_syslog.
Stop all plugins except those Input plugins, and prepare a dedicated file buffer for Output.
After the new workers start, they load the file buffer and route those events to the @ROOT label.
Is your feature request related to a problem? Please describe.
Updating Fluentd or reloading a config causes downtime. Plugins that receive data as a server, such as
in_udp
,in_tcp
, andin_syslog
, cannot receive data during this time. This means that the data sent by a client is lost during this time unless the client has a re-sending feature. This makes updating Fluentd or reloading a config difficult in some cases.Describe the solution you'd like
Add a new feature: Update/Reload without downtime.
For example, implement a mechanism similar to nginx's feature for upgrading on the fly.
The main problem is that Fluentd can't run in parallel with the same config. (It causes some conflicts, such as buffer files)
Because of this problem, it is very difficult to support all plugins. However, it is possible to support only plugins that can run in parallel.
Based on the above, the following mechanism would be a good way to achieve this.
More specifically, it would be better to run only limited Input plugins in parallel, such as
in_tcp
,in_udp
, andin_syslog
. Stop all plugins except those Input plugins, and prepare a dedicated file buffer for Output. After the new workers start, they load the file buffer and route those events to the@ROOT
label.Describe alternatives you've considered
None.
Additional context
I have already started to create a PoC.