fangli / fluent-plugin-influxdb

A buffered output plugin for fluentd and InfluxDB
MIT License
111 stars 65 forks source link

Errors transporting Collectd metrics into Influxdb via fluentbit #105

Open noahbailey opened 4 years ago

noahbailey commented 4 years ago

When transporting Collectd metrics to Influxdb, fluentd isn't able to insert the data into the influxdb database properly and creates multiple errors.

My pipline looks like this:

collectd -> fluentbit ==forward==> fluentd -> influxdb

Error Messages:

  1. Error is reported when the events are attempted to insert into Influxdb:
2020-09-11 14:45:26 -0400 [warn]: #0 fluent/log.rb:348:warn: Skip record '{"type"=>"queue_length", "type_instance"=>"", "time"=>1599849947.5258746, "interval"=>10.0, "plugin"=>"network", "plugin_instance"=>"", "host"=>"agent", "value"=>0.0}' in 'metrics', because either record has no value or at least a value is 'nil' or empty string inside the record.
  1. Then, shortly after the stacktrace is logged:
2020-09-11 14:45:25 -0400 [debug]: #0 fluent/log.rb:306:debug: taking back chunk for errors. chunk="5af0e12612733dc648753b23e8df102f"
2020-09-11 14:45:25 -0400 [warn]: #0 fluent/log.rb:348:warn: failed to flush the buffer. retry_time=4 next_retry_seconds=2020-09-11 14:45:33 15583002651185026533/274877906944000000000 -0400 chunk="5af0e12612733dc648753b23e8df102f" error_class=InfluxDB::Error error="{\"error\":\"unable to parse 'metrics type=\\\"if_packets\\\",interval=10.0,plugin=\\\"network\\\",host=\\\"agent\\\",rx=0i,tx=31i 1599849877.5258396': bad timestamp\\nunable to parse 'metrics type=\\\"total_values\\\",type_instance=\\\"dispatch-accepted\\\",interval=10.0,plugin=\\\"network\\\",host=\\\"agent\\\",value=0i 1599849877.52585': bad timestamp\\nunable to parse 'metrics type=\\\"total_values\\\",type_instance=\\\"dispatch-rejected\\\",interval=10.0,plugin=\\\"network\\\",host=\\\"agent\\\",value=0i 1599849877.5258567': bad timestamp\\nunable to parse 'metrics type=\\\"total_values\\\",type_instance=\\\"send-accepted\\\",interval=10.0,plugin=\\\"network\\\",host=\\\"agent\\\",value=959i 1599849877.5258632': bad timestamp"}\n"
2020-09-11 14:45:25 -0400 [warn]: #0 plugin/output.rb:1189:rescue in try_flush: suppressed same stacktrace

Config

Relevant config pieces:

  1. Collectd:
Interval 10.0
LoadPlugin cpu
LoadPlugin load
LoadPlugin network
LoadPlugin memory
<Plugin network>
  Server "127.0.0.1" "25826"
  ReportStats true
</Plugin>
...
  1. Fluentbit (td-agent-bit)
[INPUT]
    Name        collectd
    Tag         metrics
    Listen      127.0.0.1
    Port        25826
    TypesDB     /usr/share/collectd/types.db
...
[OUTPUT]
    Name        forward
    Match       *
    Host        192.168.xx.yy
    Port        24224
  1. Fluentd server (docker)

It's a custom container based on Ruby 2.7, and the services are installed in the dockerfile with:

gem install fluentd -v 1.11.2
gem install fluent-plugin-influxdb  -v 2.0.0

Then, the config that's mounted in the container looks like this:

<source>
  type forward
  bind 0.0.0.0
  port 24224
</source>
...
<match metrics>
  @type influxdb
  host influxdb 
  port 8086
  dbname metrics
</match>
  1. Influxdb server (docker-compose)

Influxdb itself is very basic with no auth or anything.

services:
  influxdb: 
    image: influxdb:1.7
    container_name: influxdb
    environment: 
      - INFLUXDB_DB=metrics
    volumes: 
      - influxdb_data:/var/lib/influxdb
    ports: 
      - 8086:8086

Testing

To check my config, I changed the output to Elasticsearch from Influx, and it worked as expected. I don't really want to store all my metrics in ES though, would prefer them in InfluxDB.

My config involves many more data types, and the other pipelines are all working great, just this one that's causing issues unfortunately.

Does anybody know if this is a fluentd bug, or a mistake in my setup or methodology?

Thanks!