fluent / fluent-plugin-multiprocess

Multiprocess agent plugin for Fluentd
43 stars 13 forks source link

We're using the fluent-plugin-multiprocess plugin and sometimes td-agent binds to 0.0.0.0 instead of 127.0.0.1 #10

Open jonathanhle opened 8 years ago

jonathanhle commented 8 years ago

Hi-

We're using the fluent-plugin-multiprocess plugin and sometimes td-agent binds to 0.0.0.0 instead of 127.0.0.1, when we have the following specified in our configs:

    ad-events:
      # Defines the content of the source section
      source: {
        type: 'forward',
        port: 24221,
        bind: '127.0.0.1',
        tag: 'gg.adevent'
      }
      # Defines the content of the match section
      match:
        general: {
          type: forward,
          buffer_type: memory,
          buffer_queue_limit: 128,
          send_timeout: 10s,
          recover_wait: 10s,
          heartbeat_interval: 2s,
          phi_threshold: 5,
          hard_timeout: 10s,
          flush_interval: 5s,
          retry_limit: 2
        }
        subsections:
          server: [
            {
              name: "aggregator-{{ fluentd_aggregator_ips[0] }}",
              host: "{{ fluentd_aggregator_ips[0] }}",
              weight: 33,
              port: 24221
            },
            {
              name: "aggregator-{{ fluentd_aggregator_ips[1] }}",
              host: "{{ fluentd_aggregator_ips[1] }}",
              weight: 33,
              port: 24221
            },
            {
              name: "aggregator-{{ fluentd_aggregator_ips[2] }}",
              host: "{{ fluentd_aggregator_ips[2] }}",
              weight: 33,
              port: 24221
            },
            {
              name: "aggregator-{{ fluentd_aggregator_ips[3] }}",
              host: "{{ fluentd_aggregator_ips[3] }}",
              weight: 33,
              port: 24221
            },
            {
              name: "aggregator-{{ fluentd_aggregator_ips[4] }}",
              host: "{{ fluentd_aggregator_ips[4] }}",
              weight: 33,
              port: 24221
            },
            {
              name: "aggregator-{{ fluentd_aggregator_ips[5] }}",
              host: "{{ fluentd_aggregator_ips[5] }}",
              weight: 33,
              port: 24221
            }
          ]
          # Secondary section will forward back to a local backup socket
          # Which will trigger the standalone mode (Collect and S3 upload)
          # On the client
          secondary: [
            {
              type: 'forward',
              host: '127.0.0.1',
              port: 34221
            }
          ]

This prevents things from actually shipping logs for us on that specific port, in those times td-agent binds to 0.0.0.0 instead of localhost 127.0.0.1. The issue recurs often, but is not consistently reproducible for us and the logs just say: image

2016-07-31 12:47:07 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>

The "fix"/remediation for the issue has been to just stop/start td-agent, which then allows td-agent to bind to localhost instead of 0.0.0.0.

The versions from the startup of the agent are:

2016-07-31 12:46:52 -0700 [info]: reading config file path="/etc/td-agent/td-agent.conf" 2016-07-31 12:46:52 -0700 [info]: starting fluentd-0.12.26 2016-07-31 12:46:53 -0700 [info]: gem 'fluent-mixin-config-placeholders' version '0.4.0' 2016-07-31 12:46:53 -0700 [info]: gem 'fluent-mixin-plaintextformatter' version '0.2.6' 2016-07-31 12:46:53 -0700 [info]: gem 'fluent-plugin-datacounter' version '0.4.5' 2016-07-31 12:46:53 -0700 [info]: gem 'fluent-plugin-mongo' version '0.7.13' 2016-07-31 12:46:53 -0700 [info]: gem 'fluent-plugin-multiprocess' version '0.2.1' 2016-07-31 12:46:53 -0700 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '1.5.5' 2016-07-31 12:46:53 -0700 [info]: gem 'fluent-plugin-s3' version '0.6.8' 2016-07-31 12:46:53 -0700 [info]: gem 'fluent-plugin-scribe' version '0.10.14' 2016-07-31 12:46:53 -0700 [info]: gem 'fluent-plugin-td' version '0.10.28' 2016-07-31 12:46:53 -0700 [info]: gem 'fluent-plugin-td-monitoring' version '0.2.2' 2016-07-31 12:46:53 -0700 [info]: gem 'fluent-plugin-webhdfs' version '0.4.2' 2016-07-31 12:46:53 -0700 [info]: gem 'fluentd' version '0.12.26' 2016-07-31 12:46:53 -0700 [info]: adding source type="multiprocess" 2016-07-31 12:46:53 -0700 [info]: using configuration file:

If you have and advice on what the issue might be, it'd be much appreciated. Best regards!!!! Jonathan

repeatedly commented 8 years ago

Hmm... where is the logs which td-agent tries to bind 0.0.0.0 instead of 127.0.0.1?

jonathanhle commented 8 years ago

Hi Masahiro, please find attached: td-agent.log.zip

You can grep for EADDRINUSE - it might be something around there.

tagomoris commented 8 years ago

I did grep your log, but there are no log lines with 0.0.0.0.

$ grep EADDRINUSE td-agent.log
2016-07-31 12:47:07 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:47:09 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:47:11 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:47:13 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:47:15 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:47:18 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:47:20 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:47:22 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:47:24 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:47:26 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:47:28 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:47:30 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:47:32 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:47:34 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:47:36 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:47:38 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:47:40 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:47:42 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:47:44 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:47:46 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:47:48 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:47:50 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:47:52 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:47:55 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:47:57 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:47:59 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:48:01 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:48:03 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:48:05 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:48:07 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:48:09 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:48:11 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:48:13 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:48:15 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:48:17 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:48:20 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:48:22 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:48:24 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:48:26 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:48:28 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:48:30 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:48:32 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:48:34 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:48:37 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:48:39 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:48:42 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:48:44 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:48:46 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:48:48 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:48:50 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:48:52 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:48:54 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:48:56 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:48:58 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:49:00 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:49:02 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:49:04 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:49:06 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:49:09 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:49:11 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:49:13 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:49:15 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:49:17 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:49:19 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:49:21 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
2016-07-31 12:49:23 -0700 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 24226>
repeatedly commented 8 years ago

EADDRINUSE means other process has already used same port. Do you have other error which indicates fluentd tries to bind 0.0.0.0 instead of 127.0.0.1?

jonathanhle commented 8 years ago

Hi - All of our configuration have td-agent bind to something like this in the config: bind: '127.0.0.1'

The times we see it bind to 0.0.0.0 (which we don't want and don't have errors for) is when we get an EADDRINUSE for "127.0.0.1".

ERROR EXAMPLE - td-agent didn't come up on both TCP and UDP on localhost. UDP came up on 0.0.0.0 only. We can see an EADDRINUSE in the logs for 127.0.0.1.
Doing an lsof, we'll get something like this:

on 0.0.0.0 on the port only.
# lsof -i udp:34228
COMMAND  PID     USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
ruby    2081 td-agent   11u  IPv4   9505      0t0  UDP *:34228

WORKING EXAMPLE - td-agent comes up on both localhost for both UDP and TCP

COMMAND  PID     USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
ruby    5520 td-agent   10u  IPv4  19228      0t0  TCP localhost:34228 (LISTEN)
ruby    5520 td-agent   11u  IPv4  19229      0t0  UDP localhost:34228

When we see td-agent bind to 0.0.0.0 we see the EADDRINUSE for 127.0.0.1. Binding to 0.0.0.0 is an error for us since we have it configured to bind to 127.0.0.1 in our configurations.