fluent / fluent-bit

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows
https://fluentbit.io
Apache License 2.0
5.82k stars 1.58k forks source link

incomming chunk is broken (on in_forward of fluend) #415

Closed epcim closed 6 years ago

epcim commented 6 years ago

on 0.12.4 fluentbit configuration: td-agent-bit -> td-agent

In the lab environment, I run I see high load with "incoming chunk is broken" on TD-AGENT (fluentd) backend.

(can you pls describe what are individual msg records, whether it's expected to receive such small chunks? Would it be possible to better log where/what is affected)

Also, what happens's to this log record. Is it totally skipped or it's partially received and processed?

Especially here it seems we mix MSG and the fluentd key attributes:

2017-11-03 09:41:26 +0000 [warn]: #0 [input_forward_24224] incoming chunk is broken: host="192.168.237.64" msg={"@timestamp"=>"2017-11-
03T09:41\x86\xAA@times", 116=>97, 109=>112, "2017-11-03T09:41:21.741Z"=>"_tag", "postgresql.statements"=>"path", "/var/lib/pgsql/data/p
g_log/postgresql.log-201711030000"=>"error"}```

Full logs:

2017-11-03 09:41:16 +0000 [warn]: #0 [input_forward_24224] incoming chunk is broken: host="192.168.237.64" msg=49 2017-11-03 09:41:16 +0000 [warn]: #0 [input_forward_24224] incoming chunk is broken: host="192.168.237.64" msg=58 2017-11-03 09:41:16 +0000 [warn]: #0 [input_forward_24224] incoming chunk is broken: host="192.168.237.64" msg=48 2017-11-03 09:41:16 +0000 [warn]: #0 [input_forward_24224] incoming chunk is broken: host="192.168.237.64" msg=54 2017-11-03 09:41:16 +0000 [warn]: #0 [input_forward_24224] incoming chunk is broken: host="192.168.237.64" msg=46 2017-11-03 09:41:16 +0000 [warn]: #0 [input_forward_24224] incoming chunk is broken: host="192.168.237.64" msg=52 2017-11-03 09:41:16 +0000 [warn]: #0 [input_forward_24224] incoming chunk is broken: host="192.168.237.64" msg=49 2017-11-03 09:41:16 +0000 [warn]: #0 [input_forward_24224] incoming chunk is broken: host="192.168.237.64" msg=57 2017-11-03 09:41:16 +0000 [warn]: #0 [input_forward_24224] incoming chunk is broken: host="192.168.237.64" msg="pid" 2017-11-03 09:41:16 +0000 [warn]: #0 [input_forward_24224] incoming chunk is broken: host="192.168.237.64" msg="26246" 2017-11-03 09:41:16 +0000 [warn]: #0 [input_forward_24224] incoming chunk is broken: host="192.168.237.64" msg="severity" 2017-11-03 09:41:16 +0000 [warn]: #0 [input_forward_24224] incoming chunk is broken: host="192.168.237.64" msg="INFO" 2017-11-03 09:41:26 +0000 [warn]: #0 [input_forward_24224] incoming chunk is broken: host="192.168.237.64" msg={"@timestamp"=>"2017-11- 03T09:41\x86\xAA@times", 116=>97, 109=>112, "2017-11-03T09:41:21.741Z"=>"_tag", "postgresql.statements"=>"path", "/var/lib/pgsql/data/p g_log/postgresql.log-201711030000"=>"error"} 2017-11-03 09:41:26 +0000 [warn]: #0 [input_forward_24224] incoming chunk is broken: host="192.168.237.64" msg="duplicate key value vio lates unique constraint \"ml2_vxlan_endpoints_pkey\"" 2017-11-03 09:41:26 +0000 [warn]: #0 [input_forward_24224] incoming chunk is broken: host="192.168.237.64" msg="error" 2017-11-03 09:41:26 +0000 [warn]: #0 [input_forward_24224] incoming chunk is broken: host="192.168.237.64" msg="duplicate key value vio lates unique constraint \"ml2_vxlan_endpoints_pkey\""

epcim commented 6 years ago

I see the same with fluentbit 0.12.7

edsiper commented 6 years ago

let's dig into it, is there a way to reproduce the problem ?

epcim commented 6 years ago

I may provide the log and the configuration (regexp). In a longer term I may try to extend our fluentbit/fluentd formulas (https://github.com/salt-formulas/salt-formula-fluentbit, https://github.com/salt-formulas/salt-formula-fluentd (2nd major upgrade comming)) to process some mock files during CI.

Is there a way to record/dump the fluentbit output (out stdout for example). That could say something.

edsiper commented 6 years ago

@epcim you can record outgoing records with out file which defaults to JSON

epcim commented 6 years ago

Ok, Will post it here - as of now, as some plugins I use got new versions I will try first upgrade all components and retest

epcim commented 6 years ago

@edsiper commented on Nov 6, 2017, 6:49 PM GMT+1:

@epcim you can record outgoing records with out file which defaults to JSON

I recorded couple of outfiles during another sessing, but I don't see anything broken. Would you think this may be an network issue? Or an fluend issue?

I deployed another environment, and there this is the 90% errors in fluend log. Seems that many record's are affected.

epcim commented 6 years ago

It did appear on lab env, but not in the production.

Set buffer might helped. As of now I don't have verbose/debug output => I don't see these issues anymore on latest versions of fluentbit/fluentd / plugins.