fluent / fluentd

Fluentd: Unified Logging Layer (project under CNCF)
https://www.fluentd.org
Apache License 2.0
12.91k stars 1.34k forks source link

Error event logs should include plugin id #4567

Open DavidKnacker opened 3 months ago

DavidKnacker commented 3 months ago

Describe the bug

The @id plugin parameter is used to give each plugin its unique id. This id should be included in logs generated by said plugin. This seems to work inconsistent between different plugin types. I tested the json filter plugin (for which the id is not shown) and the loki output plugin (which shows it)

working example:

<source>
  @type sample
  @id id_sample_too_long
  sample {"data":"this log is way to long to fit into one chunk"}
  tag sample.long
</source>

<match **>
  @type loki
  @id id_too_long_message
  url ...
  ...
<buffer>
  @type file
  chunk_limit_size 5
  ...
</buffer>
</match>

results in logs:

2024-07-26 13:28:37 +0000 [warn]: #0 [id_too_long_message] chunk bytes limit exceeds for an emitted event stream: 64bytes
2024-07-26 13:28:37.073654307 +0000 fluent.warn: {"message":"[id_too_long_message] chunk bytes limit exceeds for an emitted event stream: 64bytes"}

Both contain the "id_too_long_message" id of the plugin the error occured in.

But for the filter plugin:

<source>
  @type sample
  @id id_sample_not_json
  sample {"data":"this is not json}"}
  tag sample.json
</source>

<filter sample.json>
  @type parser
  @id id_not_json
  key_name data
  <parse>
    @type json
  </parse>
</filter>

The error:

2024-07-26 13:21:57 +0000 [warn]: #0 dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="pattern not matched with data 'this is not json}'" location=nil tag="sample.json" time=2024-07-26 13:21:57.079027108 +0000 record={"data"=>"this is not json}"}
2024-07-26 13:21:57.079191057 +0000 fluent.warn: {"error":"#<Fluent::Plugin::Parser::ParserError: pattern not matched with data 'this is not json}'>","location":null,"tag":"sample.json","time":1722000117,"record":{"data":"this is not json}"},"message":"dump an error event: error_class=Fluent::Plugin::Parser::ParserError error=\"pattern not matched with data 'this is not json}'\" location=nil tag=\"sample.json\" time=2024-07-26 13:21:57.079027108 +0000 record={\"data\"=>\"this is not json}\"}"}

does not contain the id "id_not_json" of the filter plugin that caused the error.

To Reproduce

The config:

<source>
  @type sample
  @id id_sample_not_json
  sample {"data":"this is not json}"}
  tag sample.json
</source>

<filter sample.json>
  @type parser
  @id id_not_json
  key_name data
  <parse>
    @type json
  </parse>
</filter>

results in:

2024-07-26 13:21:57 +0000 [warn]: #0 dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="pattern not matched with data 'this is not json}'" location=nil tag="sample.json" time=2024-07-26 13:21:57.079027108 +0000 record={"data"=>"this is not json}"}
2024-07-26 13:21:57.079191057 +0000 fluent.warn: {"error":"#<Fluent::Plugin::Parser::ParserError: pattern not matched with data 'this is not json}'>","location":null,"tag":"sample.json","time":1722000117,"record":{"data":"this is not json}"},"message":"dump an error event: error_class=Fluent::Plugin::Parser::ParserError error=\"pattern not matched with data 'this is not json}'\" location=nil tag=\"sample.json\" time=2024-07-26 13:21:57.079027108 +0000 record={\"data\"=>\"this is not json}\"}"}

Expected behavior

The @id fiel, if set, shows up in logs prodcued by the plugin. In this example I would expect:

2024-07-26 13:21:57 +0000 [warn]: #0 [id_not_json] dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="pattern not matched with data 'this is not json}'" location=nil tag="sample.json" time=2024-07-26 13:21:57.079027108 +0000 record={"data"=>"this is not json}"}
2024-07-26 13:21:57.079191057 +0000 fluent.warn: {"error":"# [id_not_json] <Fluent::Plugin::Parser::ParserError: pattern not matched with data 'this is not json}'>","location":null,"tag":"sample.json","time":1722000117,"record":{"data":"this is not json}"},"message":"dump an error event: error_class=Fluent::Plugin::Parser::ParserError error=\"pattern not matched with data 'this is not json}'\" location=nil tag=\"sample.json\" time=2024-07-26 13:21:57.079027108 +0000 record={\"data\"=>\"this is not json}\"}"}

Your Environment

- Fluentd version: 1.17.0
- Package version:
- Operating system: Garden Linux 1443.9
- Kernel version: 6.6.35-cloud-arm64

Your Configuration

<source>
  @type sample
  @id id_sample_not_json
  sample {"data":"this is not json}"}
  tag sample.json
</source>

<filter sample.json>
  @type parser
  @id id_not_json
  key_name data
  <parse>
    @type json
  </parse>
</filter>

### Your Error Log

```shell
2024-07-26 13:21:57 +0000 [warn]: #0 dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="pattern not matched with data 'this is not json}'" location=nil tag="sample.json" time=2024-07-26 13:21:57.079027108 +0000 record={"data"=>"this is not json}"}
2024-07-26 13:21:57.079191057 +0000 fluent.warn: {"error":"#<Fluent::Plugin::Parser::ParserError: pattern not matched with data 'this is not json}'>","location":null,"tag":"sample.json","time":1722000117,"record":{"data":"this is not json}"},"message":"dump an error event: error_class=Fluent::Plugin::Parser::ParserError error=\"pattern not matched with data 'this is not json}'\" location=nil tag=\"sample.json\" time=2024-07-26 13:21:57.079027108 +0000 record={\"data\"=>\"this is not json}\"}"}


### Additional context

_No response_
Athishpranav2003 commented 3 months ago

@DavidKnacker is the id missing even for logs generated by input plugin?

DavidKnacker commented 3 months ago

@Athishpranav2003 I haven't checked it for error logs thrown by an input plugin yet. But am I correct to assume that using @id to track errors should work for all kind of plugins? (or at least the ones that are shipped by default with fluentd?)

Athishpranav2003 commented 3 months ago

Ideally yes but since it's not behaving in that way I was checking what's the difference in underlying code Will check it meanwhile

Just ping me later with the case of input plugin

DavidKnacker commented 3 months ago

👍 will check the behavior for the input plugin tomorrow and report back

kenhys commented 3 months ago

another scope of Idea: track plugin which emit error event in router side, then emit more user friendly error message.

daipom commented 3 months ago

Thanks for your report. This is Fluentd's specification.

This dump an error event log is generated by RootAgent (Fluentd's common logic). The filter_parser plugin just emits the error event to the router. So, this log does not have its id.

https://github.com/fluent/fluentd/blob/e763c0761c44d9734b6aa374371387a2e8406522/lib/fluent/root_agent.rb#L344-L354

Although this is the current specification, it looks like not so good. Maybe it could be improved somehow.

daipom commented 3 months ago

Although this is the current specification, it looks like not so good. Maybe it could be improved somehow.

It would be better to include the id of the plugin that emits the error event.

It may be easy to manually include the id of filter_parser in the error info that it emits.

https://github.com/fluent/fluentd/blob/e763c0761c44d9734b6aa374371387a2e8406522/lib/fluent/plugin/filter_parser.rb#L101-L103

However, it is not so good because every plugin that emits error events needs to consider this issue.

As @kenhys says in https://github.com/fluent/fluentd/issues/4567#issuecomment-2259468182, it would be great if we could create a common mechanism for this.

another scope of Idea: track plugin which emit error event in router side, then emit more user friendly error message.

Athishpranav2003 commented 3 months ago

@daipom so basically we need to avoid change in function signature but capture the caller plugin id?

daipom commented 3 months ago

so basically we need to avoid change in function signature but capture the caller plugin id?

Yes. We can't change the signature of emit_error_event() because it is related to Event Emitter plugin helper API. We need to keep compatibility for the plugins that use this function.

Athishpranav2003 commented 3 months ago

Why don't we use log.error method to generate the error log than emitting it as event(It's indirect but works the same way right)?

DavidKnacker commented 3 months ago

@Athishpranav2003 to come back to your question, I do see the @id field for logs generated by the input plugin.

Athishpranav2003 commented 3 months ago

Why don't we use log.error method to generate the error log than emitting it as event(It's indirect but works the same way right)?

@daipom what about this?

DavidKnacker commented 2 months ago

Are there any updates on this?

daipom commented 2 months ago

Sorry for my late response.

Why don't we use log.error method to generate the error log than emitting it as event(It's indirect but works the same way right)?

Filter plugin should use emit_error_event(). It outputs events as @ERROR label. It is intended to rescue collected data that could not be processed.

On the other hand, the events from Fluentd logger mean logs of Fluentd, not collected data by input plugins. Fluentd logger outputs events as @FLUENT_LOG label.

Using both would be a possible solution. However, it would be better if it could be resolved through a common mechanism.

daipom commented 2 months ago

I made a PR for this issue.

However, I'm concerned about some race conditions of the related existing logic...

DavidKnacker commented 2 months ago

I cannot say anything about potential race conditions, but I just tested the PR in my cluster and it seems to exactly achieve what I hoped for. Thanks @daipom! :) Would be great to have this in.

daipom commented 2 months ago

@DavidKnacker Thanks! I will check the race condition!

DavidKnacker commented 1 week ago

Hey @daipom, are there any updates on the ticket?

daipom commented 4 days ago

Sorry, we need to consider the race condition issue more. I wish we could get this into 1.18.0, but I am not sure...