fluent / fluentd-docs

This repository is deprecated. Go to fluentd-docs-gitbook repository.
49 stars 119 forks source link

Transfer made using UDP may occur data loss if the consumer is down #536

Open ameyrk18 opened 6 years ago

ameyrk18 commented 6 years ago

Hi,

Looks like we need to specify if we choose UDP over TCP for the data transfer the data will be a loss if the receiver goes down https://docs.fluentd.org/v1.0/articles/high-availability under the failure case scenarios?

For example this issue: https://github.com/emsearcy/fluent-plugin-gelf/issues/7

Thanks, Amey

repeatedly commented 6 years ago

Hmm... I'm not sure your detailed deployment but HA deployment can't recover UDP potential data loss issue.

fujimotos commented 6 years ago

Looks like we need to specify if we choose UDP over TCP for the data transfer the data will be a loss if the receiver goes down https://docs.fluentd.org/v1.0/articles/high-availability under the failure case scenarios?

That pretty much depends on how the particular plugin is designed.

Fluentd core provides a set of APIs to plugins and is capable of handling failure scenarios (e.g. resending records when the destination node is down) for these plugins, if these plugins are willing to communicate with the core well.

For example, see how fluent-plugin-gelf implements the write() interface:

https://github.com/emsearcy/fluent-plugin-gelf/blob/master/lib/fluent/plugin/out_gelf.rb#L112

If you choose UDP as protocol, this method always returns successfully even if the target node is non-existent. This means that Fluentd core never knows if the data was transmitted to the destination node successfully, since the plugin makes no attempt at all to notify the status.

ameyrk18 commented 6 years ago

@repeatedly my stack has set fluentd instances shipping logs to log forwarder instance and from there to load balancer and from there to graylog (consumer). My question was if we are using aggregators and chose UDP over tcp to transmit messages to the consumer, there will be a data loss if the consumer is down. The document doesn't say about this failure scenario. I believe there should be some note about this scenario in this section.

@fujimotos thanks for that.