fluent / fluentd

Fluentd: Unified Logging Layer (project under CNCF)
https://www.fluentd.org
Apache License 2.0
12.81k stars 1.34k forks source link

Unable to Start td-agent #2030

Closed rpn588 closed 6 years ago

rpn588 commented 6 years ago

Environment

Configuration

<source>
  @type syslog
  port 5140
  bind 127.0.0.1
  protocol_type tcp
  tag system
</source>

<match system.**>
 @type stdout
</match>

Description

I am unable to start td-agent 9 times out of 10. On the occasions I am able to start it, it is unclear exactly what the cause and fix were as it seems inconsistent.

[root@ip-[..REDACTED..] ~]# systemctl start td-agent.service
Job for td-agent.service failed because the control process exited with error code. See "systemctl status td-agent.service" and "journalctl -xe" for details.

[root@ip-[..REDACTED..] ~]# systemctl status td-agent.service
● td-agent.service - td-agent: Fluentd based data collector for Treasure Data
   Loaded: loaded (/usr/lib/systemd/system/td-agent.service; disabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Fri 2018-06-22 19:42:18 BST; 40s ago
     Docs: https://docs.treasuredata.com/articles/td-agent
  Process: 11748 ExecStart=/opt/td-agent/embedded/bin/fluentd --log /var/log/td-agent/td-agent.log --daemon /var/run/td-agent/td-agent.pid $TD_AGENT_OPTIONS (code=exited, status=1/FAILURE)

Jun 22 19:42:18 ip-[..REDACTED..] systemd[1]: td-agent.service: control process exited, code=exited status=1
Jun 22 19:42:18 ip-[..REDACTED..] systemd[1]: Failed to start td-agent: Fluentd based data collector for Treasure Data.
Jun 22 19:42:18 ip-[..REDACTED..] systemd[1]: Unit td-agent.service entered failed state.
Jun 22 19:42:18 ip-[..REDACTED..] systemd[1]: td-agent.service failed.
Jun 22 19:42:18 ip-[..REDACTED..] systemd[1]: td-agent.service holdoff time over, scheduling restart.
Jun 22 19:42:18 ip-[..REDACTED..] systemd[1]: start request repeated too quickly for td-agent.service
Jun 22 19:42:18 ip-[..REDACTED..] systemd[1]: Failed to start td-agent: Fluentd based data collector for Treasure Data.
Jun 22 19:42:18 ip-[..REDACTED..] systemd[1]: Unit td-agent.service entered failed state.
Jun 22 19:42:18 ip-[..REDACTED..] systemd[1]: td-agent.service failed.

Executing the command myself results in an error surrounding config file existence:

[root@ip-[..REDACTED..] ~]# /opt/td-agent/embedded/bin/fluentd --log /var/log/td-agent/td-agent.log --daemon /var/run/td-agent/td-agent.pid $TD_AGENT_OPTIONS
/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.2.2/lib/fluent/supervisor.rb:760:in `initialize': No such file or directory @ rb_sysopen - /etc/fluent/fluent.conf (Errno::ENOENT)
    from /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.2.2/lib/fluent/supervisor.rb:760:in `open'
    from /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.2.2/lib/fluent/supervisor.rb:760:in `read_config'
    from /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.2.2/lib/fluent/supervisor.rb:477:in `run_supervisor'
    from /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.2.2/lib/fluent/command/fluentd.rb:310:in `<top (required)>'
    from /opt/td-agent/embedded/lib/ruby/site_ruby/2.4.0/rubygems/core_ext/kernel_require.rb:55:in `require'
    from /opt/td-agent/embedded/lib/ruby/site_ruby/2.4.0/rubygems/core_ext/kernel_require.rb:55:in `require'
    from /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.2.2/bin/fluentd:8:in `<top (required)>'
    from /opt/td-agent/embedded/bin/fluentd:23:in `load'
    from /opt/td-agent/embedded/bin/fluentd:23:in `<main>'

It's defaulting to /etc/fluentd/fluentd.conf which doesn't exist as I am using /etc/td-agent/td-agent.conf. So I try to use it and hit another issue:

[root@ip-[..REDACTED..] ~]# /opt/td-agent/embedded/bin/fluentd --log /var/log/td-agent/td-agent.log --daemon /var/run/td-agent/td-agent.pid -c /etc/td-agent/td-agent.conf
Unexpected error No such file or directory @ rb_sysopen - /var/run/td-agent/td-agent.pid
  /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/serverengine-2.0.6/lib/serverengine/daemon.rb:200:in `initialize'
  /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/serverengine-2.0.6/lib/serverengine/daemon.rb:200:in `open'
  /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/serverengine-2.0.6/lib/serverengine/daemon.rb:200:in `write_pid_file'
  /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/serverengine-2.0.6/lib/serverengine/daemon.rb:193:in `daemonize_with_double_fork'
  /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/serverengine-2.0.6/lib/serverengine/daemon.rb:107:in `main'
  /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/serverengine-2.0.6/lib/serverengine/daemon.rb:68:in `run'
  /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.2.2/lib/fluent/supervisor.rb:632:in `supervise'
  /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.2.2/lib/fluent/supervisor.rb:502:in `run_supervisor'
  /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.2.2/lib/fluent/command/fluentd.rb:310:in `<top (required)>'
  /opt/td-agent/embedded/lib/ruby/site_ruby/2.4.0/rubygems/core_ext/kernel_require.rb:55:in `require'
  /opt/td-agent/embedded/lib/ruby/site_ruby/2.4.0/rubygems/core_ext/kernel_require.rb:55:in `require'
  /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.2.2/bin/fluentd:8:in `<top (required)>'
  /opt/td-agent/embedded/bin/fluentd:23:in `load'
  /opt/td-agent/embedded/bin/fluentd:23:in `<main>'

I can successfully run td-agent like this but undaemonised:

LD_PRELOAD=/opt/td-agent/embedded/lib/libjemalloc.so /usr/sbin/td-agent -c /etc/td-agent/td-agent.conf --user td-agent --group td-agent

At this point I'm thinking 'OK, I must have something inherently wrong here'. Figured here would be the best place to get some assistance.

repeatedly commented 6 years ago

Unexpected error No such file or directory @ rb_sysopen - /var/run/td-agent/td-agent.pid

Error says directory doesn't exist and add --daemon option again.

repeatedly commented 6 years ago

This seems not fluentd bug. closed..

deepaksharma17 commented 5 years ago

Facing same issue, here are my conf and issue details. Is there anyone can help

ulimit -n 65536

cat /etc/td-agent/td-agent.conf type syslog tag graylog2 type gelf host 0.0.0.0 port 12201 flush_interval 5s

cat /etc/rsyslog.conf . @127.0.0.1:5140

/usr/sbin/td-agent-gem list --local fluent-config-regexp-type (1.0.0) fluent-logger (0.8.0) fluent-plugin-elasticsearch (3.5.1) fluent-plugin-kafka (0.9.4) fluent-plugin-prometheus (1.4.0) fluent-plugin-record-modifier (2.0.1) fluent-plugin-rewrite-tag-filter (2.2.0) fluent-plugin-s3 (1.1.10) fluent-plugin-td (1.0.0) fluent-plugin-td-monitoring (0.2.4) fluent-plugin-webhdfs (1.2.3) fluentd (1.5.2, 1.4.2) gelf (3.1.0)

systemctl status td-agent.service

● td-agent.service - td-agent: Fluentd based data collector for Treasure Data
   Loaded: loaded (/etc/systemd/system/td-agent.service; disabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Fri 2019-06-21 11:37:24 GMT; 44min ago
     Docs: https://docs.treasuredata.com/articles/td-agent
  Process: 29745 ExecStart=/opt/td-agent/embedded/bin/fluentd --log /var/log/td-agent/td-agent.log --daemon /var/run/td-agent/td-agent.pid $TD_AGENT_OPTIONS (code=exited, status=1/FAILURE)

Jun 21 11:37:23 lit-rhel7-test3 systemd[1]: td-agent.service: control process exited, code=exited status=1
Jun 21 11:37:23 lit-rhel7-test3 systemd[1]: Failed to start td-agent: Fluentd based data collector for Treasure Data.
Jun 21 11:37:23 lit-rhel7-test3 systemd[1]: Unit td-agent.service entered failed state.
Jun 21 11:37:23 lit-rhel7-test3 systemd[1]: td-agent.service failed.
Jun 21 11:37:24 lit-rhel7-test3 systemd[1]: td-agent.service holdoff time over, scheduling restart.
Jun 21 11:37:24 lit-rhel7-test3 systemd[1]: Stopped td-agent: Fluentd based data collector for Treasure Data.
Jun 21 11:37:24 lit-rhel7-test3 systemd[1]: start request repeated too quickly for td-agent.service
Jun 21 11:37:24 lit-rhel7-test3 systemd[1]: Failed to start td-agent: Fluentd based data collector for Treasure Data.
Jun 21 11:37:24 lit-rhel7-test3 systemd[1]: Unit td-agent.service entered failed state.
Jun 21 11:37:24 lit-rhel7-test3 systemd[1]: td-agent.service failed.
Reiner030 commented 5 years ago

@repeatedly Same here... found this issue while searching for failure message after setup a new instance by saltstack unmodified after some months of not needed server changes (=> no modifications done on my side but td-agent updates on "your" side).

This may be "not a fluent bug" but it is somehow a combined td-agent/fluent packaging / using bug... offered in td-agent repository as origin package.

From release history the other repository: https://github.com/treasure-data/omnibus-td-agent/releases didn't match the offered packages:

# apt-cache policy td-agent
td-agent:
  Installed: 3.4.0-0
  Candidate: 3.4.0-0
  Version table:
 *** 3.4.0-0 900
        900 http://packages.treasuredata.com/3/debian/stretch stretch/contrib amd64 Packages
        100 /var/lib/dpkg/status
     3.3.0-1 900
        900 http://packages.treasuredata.com/3/debian/stretch stretch/contrib amd64 Packages
     3.3.0-0 900
        900 http://packages.treasuredata.com/3/debian/stretch stretch/contrib amd64 Packages
     3.2.1-0 900
        900 http://packages.treasuredata.com/3/debian/stretch stretch/contrib amd64 Packages
     3.2.0-0 900
        900 http://packages.treasuredata.com/3/debian/stretch stretch/contrib amd64 Packages
     3.1.1-0 900
        900 http://packages.treasuredata.com/3/debian/stretch stretch/contrib amd64 Packages
     3.1.0-0 900
        900 http://packages.treasuredata.com/3/debian/stretch stretch/contrib amd64 Packages

but maybe it's regardless the right repository for the issue. But then please notify it and do not only close the issue here as "not our bug/problem".

From systemd control file the right config is given by ENV but not used:

# systemctl cat td-agent.service
# /lib/systemd/system/td-agent.service
[Unit]
Description=td-agent: Fluentd based data collector for Treasure Data
Documentation=https://docs.treasuredata.com/articles/td-agent
After=network-online.target
Wants=network-online.target

[Service]
User=td-agent
Group=td-agent
LimitNOFILE=65536
Environment=LD_PRELOAD=/opt/td-agent/embedded/lib/libjemalloc.so
Environment=GEM_HOME=/opt/td-agent/embedded/lib/ruby/gems/2.4.0/
Environment=GEM_PATH=/opt/td-agent/embedded/lib/ruby/gems/2.4.0/
Environment=FLUENT_CONF=/etc/td-agent/td-agent.conf
Environment=FLUENT_PLUGIN=/etc/td-agent/plugin
Environment=FLUENT_SOCKET=/var/run/td-agent/td-agent.sock
Environment=TD_AGENT_OPTIONS=
EnvironmentFile=-/etc/sysconfig/td-agent
PIDFile=/var/run/td-agent/td-agent.pid
RuntimeDirectory=td-agent
Type=forking
ExecStart=/opt/td-agent/embedded/bin/fluentd --log /var/log/td-agent/td-agent.log --daemon /var/run/td-agent/td-agent.pid $TD_AGENT_OPTIONS
ExecStop=/bin/kill -TERM ${MAINPID}
ExecReload=/bin/kill -HUP ${MAINPID}
Restart=always
TimeoutStopSec=120

[Install]
WantedBy=multi-user.target

where it

Reiner030 commented 5 years ago

Btw. it seems that the "pre-packaged" gems / ruby installation in td-agent package seems not sufficient/complete enough so newer packages can break this service in above mentioned behavior.

One workaround I found was to remove the debugging port setup by default like:

## live debugging agent
<source>
        @type debug_agent
        bind 127.0.0.1
        port 24230
</source>

but it was not very useful because it helped only to drop the "port already bound" error and let further errors open.

I had setup before latest rkhunter from Debian testing with pin prio <100 so there should't be testing packages installed if not explicit requested. But somehow there where all dependencies also upgraded to testing packages; here the ones which aren't Perl based:

 libc6
 libc6-dev
 libc-bin
 libc-dev-bin
 libc-l10n
 libgnutls30
 libhogweed4
 libidn2-0
 libnettle6
 libp11-kit0
 libssl1.1
 libssl-dev
 libtasn1-6
 locales
 locales-all
 python3-boto
 python-boto

After downgrading them back to Debian Stretch versions td-agent/fluentd works again fine. Hope this helps as hint also other people to fix their equal errors.

blc16 commented 5 years ago

I am running into the same issue "/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.7.0/lib/fluent/supervisor.rb:769:in initialize': No such file or directory @ rb_sysopen - /etc/fluent/fluent.conf (Errno::ENOENT) from /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.7.0/lib/fluent/supervisor.rb:769:inopen' from /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.7.0/lib/fluent/supervisor.rb:769:in read_config' from /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.7.0/lib/fluent/supervisor.rb:479:inrun_supervisor' from /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.7.0/lib/fluent/command/fluentd.rb:314:in <top (required)>' from /opt/td-agent/embedded/lib/ruby/site_ruby/2.4.0/rubygems/core_ext/kernel_require.rb:55:inrequire' from /opt/td-agent/embedded/lib/ruby/site_ruby/2.4.0/rubygems/core_ext/kernel_require.rb:55:in require' from /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.7.0/bin/fluentd:8:in<top (required)>' from /opt/td-agent/embedded/bin/fluentd:23:in load' from /opt/td-agent/embedded/bin/fluentd:23:in

' ".

This happens when I install a prepackaged plugin gem and try to start up the service. Is there any way to fix this or am I unable to use prepackaged gems? When I edit "opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.7.0/lib/fluent/env.rb" to include the correct config path to the td-agent config, I run into: Unexpected error No such file or directory @ rb_sysopen - /var/run/td-agent/td-agent.pid /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/serverengine-2.1.1/lib/serverengine/daemon.rb:200:ininitialize' /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/serverengine-2.1.1/lib/serverengine/daemon.rb:200:in open' /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/serverengine-2.1.1/lib/serverengine/daemon.rb:200:inwrite_pid_file' /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/serverengine-2.1.1/lib/serverengine/daemon.rb:193:in daemonize_with_double_fork' /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/serverengine-2.1.1/lib/serverengine/daemon.rb:107:inmain' /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/serverengine-2.1.1/lib/serverengine/daemon.rb:68:in run' /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.7.0/lib/fluent/supervisor.rb:635:insupervise' /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.7.0/lib/fluent/supervisor.rb:504:in run_supervisor' /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.7.0/lib/fluent/command/fluentd.rb:314:in<top (required)>' /opt/td-agent/embedded/lib/ruby/site_ruby/2.4.0/rubygems/core_ext/kernel_require.rb:55:in require' /opt/td-agent/embedded/lib/ruby/site_ruby/2.4.0/rubygems/core_ext/kernel_require.rb:55:inrequire' /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.7.0/bin/fluentd:8:in <top (required)>' /opt/td-agent/embedded/bin/fluentd:23:inload' /opt/td-agent/embedded/bin/fluentd:23:in <main>'

This only happens to me when I have a prepackaged gem installed. It seems there is a issue with prepackaged gems and td-agent. I am using the latest version (3.5.0), which included the fix to the FLUENT_CONF environment variable and am still running into this issue. Does anyone know how to fix this and still use a prepackaged gem or an alternate method to using a prepackaged gem?

jroberts07 commented 4 years ago

I too am facing the exact issue @rpn588 originally posted. Unsure if there have been any fixes/workarounds for this. Red Hat installation fine. Then:

[root@REDACTED]# systemctl status td-agent.service
● td-agent.service - td-agent: Fluentd based data collector for Treasure Data
   Loaded: loaded (/usr/lib/systemd/system/td-agent.service; disabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Wed 2019-10-16 15:32:22 BST; 13min ago
     Docs: https://docs.treasuredata.com/articles/td-agent
  Process: 27990 ExecStart=/opt/td-agent/embedded/bin/fluentd --log /var/log/td-agent/td-agent.log --daemon /var/run/td-agent/td-agent.pid $TD_AGENT_OPTIONS (code=exited, status=1/FAILURE)

Oct 16 15:32:22 REDACTED systemd[1]: td-agent.service: control process exited, code=exited status=1
Oct 16 15:32:22 REDACTED systemd[1]: Failed to start td-agent: Fluentd based data collector for Treasure Data.
Oct 16 15:32:22 REDACTED systemd[1]: Unit td-agent.service entered failed state.
Oct 16 15:32:22 REDACTED systemd[1]: td-agent.service failed.
Oct 16 15:32:22 REDACTED systemd[1]: td-agent.service holdoff time over, scheduling restart.
Oct 16 15:32:22 REDACTED systemd[1]: start request repeated too quickly for td-agent.service
Oct 16 15:32:22 REDACTED systemd[1]: Failed to start td-agent: Fluentd based data collector for Treasure Data.
Oct 16 15:32:22 REDACTED systemd[1]: Unit td-agent.service entered failed state.
Oct 16 15:32:22 REDACTED systemd[1]: td-agent.service failed.

And followed the same next step by running the bad command:

[root@REDACTED]# /opt/td-agent/embedded/bin/fluentd --log /var/log/td-agent/td-agent.log --daemon /var/run/td-agent/td-agent.pid $TD_AGENT_OPTIONS
/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.7.0/lib/fluent/supervisor.rb:769:in `initialize': No such file or directory @ rb_sysopen - /etc/fluent/fluent.conf (Errno::ENOENT)
    from /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.7.0/lib/fluent/supervisor.rb:769:in `open'
    from /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.7.0/lib/fluent/supervisor.rb:769:in `read_config'
    from /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.7.0/lib/fluent/supervisor.rb:479:in `run_supervisor'
    from /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.7.0/lib/fluent/command/fluentd.rb:314:in `<top (required)>'
    from /opt/td-agent/embedded/lib/ruby/site_ruby/2.4.0/rubygems/core_ext/kernel_require.rb:55:in `require'
    from /opt/td-agent/embedded/lib/ruby/site_ruby/2.4.0/rubygems/core_ext/kernel_require.rb:55:in `require'
    from /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.7.0/bin/fluentd:8:in `<top (required)>'
    from /opt/td-agent/embedded/bin/fluentd:23:in `load'
    from /opt/td-agent/embedded/bin/fluentd:23:in `<main>'

Why is it looking for config here /etc/fluent/fluent.conf when the installation guide https://docs.fluentd.org/installation/install-by-rpm says:

Please make sure your configuration file is located at /etc/td-agent/td-agent.conf.

repeatedly commented 4 years ago

Why is it looking for config here /etc/fluent/fluent.conf when the installation guide

Because systemd unit file overwrites this value. See td-agent systemd unit file: https://github.com/treasure-data/omnibus-td-agent/blob/d2f46d1d1f24f34b92b13fd26c3883b3be63ca94/templates/etc/systemd/td-agent.service.erb#L14

jroberts07 commented 4 years ago

@repeatedly Okay interesting. My unit file looks like this:

[root@REDACTED]# cat td-agent.service
[Unit]
Description=td-agent: Fluentd based data collector for Treasure Data
Documentation=https://docs.treasuredata.com/articles/td-agent
After=network-online.target
Wants=network-online.target

[Service]
User=td-agent
Group=td-agent
LimitNOFILE=65536
Environment=LD_PRELOAD=/opt/td-agent/embedded/lib/libjemalloc.so
Environment=GEM_HOME=/opt/td-agent/embedded/lib/ruby/gems/2.4.0/
Environment=GEM_PATH=/opt/td-agent/embedded/lib/ruby/gems/2.4.0/
Environment=FLUENT_CONF=/etc/td-agent/td-agent.conf
Environment=FLUENT_PLUGIN=/etc/td-agent/plugin
Environment=FLUENT_SOCKET=/var/run/td-agent/td-agent.sock
Environment=TD_AGENT_OPTIONS=
EnvironmentFile=-/etc/sysconfig/td-agent
PIDFile=/var/run/td-agent/td-agent.pid
RuntimeDirectory=td-agent
Type=forking
ExecStart=/opt/td-agent/embedded/bin/fluentd --log /var/log/td-agent/td-agent.log --daemon /var/run/td-agent/td-agent.pid $TD_AGENT_OPTIONS
ExecStop=/bin/kill -TERM ${MAINPID}
ExecReload=/bin/kill -HUP ${MAINPID}
Restart=always
TimeoutStopSec=120

[Install]
WantedBy=multi-user.target
repeatedly commented 4 years ago

My comment is for /opt/td-agent/embedded/bin/fluentd --log /var/log/td-agent/td-agent.log --daemon /var/run/td-agent/td-agent.pid $TD_AGENT_OPTIONS error. Your error with systemctl is other reason.

ganmacs commented 4 years ago

@jroberts07 To begin with, your reproduction step seems wrong. As @repeatedly said, running td-agent via systemd completely diffs from running it by your command(/opt/td-agent/embedded/bin/fluentd --log /var/log/td-agent/td-agent.log --daemon /var/run/td-agent/td-agent.pid $TD_AGENT_OPTION). systemd sets enviroment variables for td-agent. No such file or directory @ rb_sysopen - /etc/fluent/fluent.conf has nothing to do with your systemd's error.

Also, Please give us other logs, such as journalctl's ? systemd's log does not help as you know.

jroberts07 commented 4 years ago

In the end I solved it by creating a symlink file /etc/fluent/fluent.conf that points to td-agent.conf. And also changing the permissions of the var/run folder so td-agent user can read it

deepaksharma17 commented 4 years ago

I am still facing the same issue, pasting all my configs. Please help me with a solid fix

cat /etc/td-agent/td-agent.conf

# Fluentd GELF output
<source>
  @type syslog
  tag graylog2
</source>

<match graylog2.**>
  @type gelf
  host <graylog-server-fqdn>
  port 12201
  <buffer>
    flush_interval 5s
  </buffer>
</match>

cat /etc/rsyslog.conf

# Fluentd info
*.* @127.0.0.1:5140

/usr/sbin/td-agent-gem list --local

fluent-config-regexp-type (1.0.0)
fluent-logger (0.8.2)
fluent-plugin-elasticsearch (4.0.9)
fluent-plugin-kafka (0.13.0)
fluent-plugin-prometheus (1.8.0)
fluent-plugin-prometheus_pushgateway (0.0.2)
fluent-plugin-record-modifier (2.1.0)
fluent-plugin-rewrite-tag-filter (2.3.0)
fluent-plugin-s3 (1.3.2)
fluent-plugin-systemd (1.0.2)
fluent-plugin-td (1.1.0)
fluent-plugin-td-monitoring (0.2.4)
fluent-plugin-webhdfs (1.2.5)
fluentd (1.11.1)
gelf (3.1.0)

cat /lib/systemd/system/td-agent.service

[Unit]
Description=td-agent: Fluentd based data collector for Treasure Data
Documentation=https://docs.treasuredata.com/articles/td-agent
After=network-online.target
Wants=network-online.target

[Service]
User=td-agent
Group=td-agent
LimitNOFILE=65536
Environment=LD_PRELOAD=/opt/td-agent/embedded/lib/libjemalloc.so
Environment=GEM_HOME=/opt/td-agent/embedded/lib/ruby/gems/2.4.0/
Environment=GEM_PATH=/opt/td-agent/embedded/lib/ruby/gems/2.4.0/
Environment=FLUENT_CONF=/etc/td-agent/td-agent.conf
Environment=FLUENT_PLUGIN=/etc/td-agent/plugin
Environment=FLUENT_SOCKET=/var/run/td-agent/td-agent.sock
Environment=TD_AGENT_LOG_FILE=/var/log/td-agent/td-agent.log
Environment=TD_AGENT_OPTIONS=
EnvironmentFile=-/etc/sysconfig/td-agent
PIDFile=/var/run/td-agent/td-agent.pid
RuntimeDirectory=td-agent
Type=forking
ExecStart=/opt/td-agent/embedded/bin/fluentd --log $TD_AGENT_LOG_FILE --daemon /var/run/td-agent/td-agent.pid $TD_AGENT_OPTIONS
ExecStop=/bin/kill -TERM ${MAINPID}
ExecReload=/bin/kill -HUP ${MAINPID}
Restart=always
TimeoutStopSec=120

[Install]
WantedBy=multi-user.target

stat /lib/systemd/system/td-agent.service

  File: ‘/lib/systemd/system/td-agent.service’
  Size: 1087            Blocks: 8          IO Block: 4096   regular file
Device: fd02h/64770d    Inode: 17060040    Links: 1
Access: (0755/-rwxr-xr-x)  Uid: (  991/td-agent)   Gid: (  988/td-agent)
Access: 2020-09-08 06:47:36.856881024 +0000
Modify: 2020-09-02 11:18:57.800655295 +0000
Change: 2020-09-08 07:07:44.017793124 +0000
 Birth: -

cat td-agent.log

2020-09-08 03:12:01 +0000 [info]: #0 flushing all buffer forcedly
2020-09-08 06:40:27 +0000 [info]: Received graceful stop
2020-09-08 06:40:27 +0000 [info]: Received graceful stop
2020-09-08 06:40:27 +0000 [info]: #0 fluentd worker is now stopping worker=0
2020-09-08 06:40:27 +0000 [info]: #0 shutting down fluentd worker worker=0
2020-09-08 06:40:27 +0000 [info]: #0 shutting down input plugin type=:forward plugin_id="input_forward"
2020-09-08 06:40:27 +0000 [info]: #0 shutting down input plugin type=:http plugin_id="input_http"
2020-09-08 06:40:27 +0000 [info]: #0 shutting down input plugin type=:debug_agent plugin_id="input_debug_agent"
2020-09-08 06:40:27 +0000 [info]: #0 shutting down output plugin type=:tdlog plugin_id="output_td"
2020-09-08 06:40:27 +0000 [info]: #0 shutting down output plugin type=:stdout plugin_id="output_stdout"
2020-09-08 06:40:27 +0000 [info]: Worker 0 finished with status 0

systemctl restart td-agent Job for td-agent.service failed because the control process exited with error code. See "systemctl status td-agent.service" and "journalctl -xe" for details. systemctl status td-agent.service

● td-agent.service - td-agent: Fluentd based data collector for Treasure Data
   Loaded: loaded (/usr/lib/systemd/system/td-agent.service; disabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Tue 2020-09-08 07:15:35 GMT; 4s ago
     Docs: https://docs.treasuredata.com/articles/td-agent
  Process: 38074 ExecStart=/opt/td-agent/embedded/bin/fluentd --log $TD_AGENT_LOG_FILE --daemon /var/run/td-agent/td-agent.pid $TD_AGENT_OPTIONS (code=exited, status=1/FAILURE)
 Main PID: 3438 (code=exited, status=0/SUCCESS)

Sep 08 07:15:35 lit-vatam-t009 systemd[1]: td-agent.service: control process exited, code=exited status=1
Sep 08 07:15:35 lit-vatam-t009 systemd[1]: Failed to start td-agent: Fluentd based data collector for Treasure Data.
Sep 08 07:15:35 lit-vatam-t009 systemd[1]: Unit td-agent.service entered failed state.
Sep 08 07:15:35 lit-vatam-t009 systemd[1]: td-agent.service failed.
Sep 08 07:15:35 lit-vatam-t009 systemd[1]: td-agent.service holdoff time over, scheduling restart.
Sep 08 07:15:35 lit-vatam-t009 systemd[1]: Stopped td-agent: Fluentd based data collector for Treasure Data.
Sep 08 07:15:35 lit-vatam-t009 systemd[1]: start request repeated too quickly for td-agent.service
Sep 08 07:15:35 lit-vatam-t009 systemd[1]: Failed to start td-agent: Fluentd based data collector for Treasure Data.
Sep 08 07:15:35 lit-vatam-t009 systemd[1]: Unit td-agent.service entered failed state.
Sep 08 07:15:35 lit-vatam-t009 systemd[1]: td-agent.service failed.

journalctl -xe

Sep 08 07:15:35 lit-vatam-t009 fluentd[38074]: from /opt/td-agent/embedded/lib/ruby/2.4.0/rubygems/core_ext/kernel_require.rb:5
Sep 08 07:15:35 lit-vatam-t009 fluentd[38074]: from /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.11.1/bin/fluentd:
Sep 08 07:15:35 lit-vatam-t009 fluentd[38074]: from /opt/td-agent/embedded/bin/fluentd:23:in `load'
Sep 08 07:15:35 lit-vatam-t009 fluentd[38074]: from /opt/td-agent/embedded/bin/fluentd:23:in `<main>'
Sep 08 07:15:35 lit-vatam-t009 systemd[1]: td-agent.service: control process exited, code=exited status=1
Sep 08 07:15:35 lit-vatam-t009 systemd[1]: Failed to start td-agent: Fluentd based data collector for Treasure Data.
-- Subject: Unit td-agent.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit td-agent.service has failed.
--
-- The result is failed.
Sep 08 07:15:35 lit-vatam-t009 systemd[1]: Unit td-agent.service entered failed state.
Sep 08 07:15:35 lit-vatam-t009 systemd[1]: td-agent.service failed.
Sep 08 07:15:35 lit-vatam-t009 systemd[1]: td-agent.service holdoff time over, scheduling restart.
Sep 08 07:15:35 lit-vatam-t009 systemd[1]: Stopped td-agent: Fluentd based data collector for Treasure Data.
-- Subject: Unit td-agent.service has finished shutting down
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit td-agent.service has finished shutting down.
Sep 08 07:15:35 lit-vatam-t009 systemd[1]: start request repeated too quickly for td-agent.service
Sep 08 07:15:35 lit-vatam-t009 systemd[1]: Failed to start td-agent: Fluentd based data collector for Treasure Data.
-- Subject: Unit td-agent.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit td-agent.service has failed.
--
-- The result is failed.
Sep 08 07:15:35 lit-vatam-t009 systemd[1]: Unit td-agent.service entered failed state.
Sep 08 07:15:35 lit-vatam-t009 systemd[1]: td-agent.service failed.
Sep 08 07:15:40 lit-vatam-t009 systemd[1]: Configuration file /usr/lib/systemd/system/td-agent.service is marked executable. Pl

If any more info is needed please let me know