fluent / fluentd

Fluentd: Unified Logging Layer (project under CNCF)
https://www.fluentd.org
Apache License 2.0
12.82k stars 1.34k forks source link

JSON parser plugin can sometimes output records as strings rather than `Hash` #4100

Open jwreford99 opened 1 year ago

jwreford99 commented 1 year ago

Describe the bug

The JSON parser plugin can sometimes emit records as a string rather than as a Hash. Similar to https://github.com/fluent/fluentd/issues/1279, the expectation of plugins further down the line is that the record it receives is always a Hash, but this contract is broken in some circumstances by the JSON plugin leading to some very difficult to diagnose bugs.

This happens because a string is considered valid JSON (SO) and the JSON parser (Oj at least) treats it as such and parses it successfully. But the thing it outputs from it's parsing is not a Hash type, it is a string, and at no point in the JSON parser does it throw if the record it is returning isn't a Hash object.

To Reproduce

If you setup a fluentd configmap which a super simple JSON filter setup:

    <filter gearset.*>
      @type parser
      <parse>
        @type json
        json_parser oj
      </parse>
      replace_invalid_sequence true
      key_name message
      emit_invalid_record_to_error false
    </filter>

When presented with a record which has {"message": "\"HELLO\""} it will parse it so that the resulting record is "HELLO" (i.e. not a Hash type, but just a string. This then causes errors in plugins further down the line (for example in my example this was followed by the Prometheus plugin which threw an error undefined methodmap' for #\nDid you mean? tap` which was rather hard to debug).

Expected behavior

I would of expected the parser plugin to call error out if the thing it was parsing is a string (i.e. the same behaviour you see if the parser failed to parse the JSON). Potentially I would just expect us to check if values is a Hash on https://github.com/fluent/fluentd/blob/d5df992d844af38233c4d3788de714d644ddb6c4/lib/fluent/plugin/filter_parser.rb#L74

But that might change behaviour in an unexpected way for people, so it might need to be a configuration value. It is this question what led me to doing an Issue rather than a PR

Your Environment

- Fluentd version: `fluentd 1.14.6`
- Operating system: `PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"`
- Kernel version: `5.10.102.1-microsoft-standard-WSL2`

Your Configuration

<source>
      @type tail
      read_from_head true
      path /var/log/containers/__.log
      pos_file /fluentd/pos/__.log.pos
      tag gearset.x
      <parse>
        @type cri
      </parse>
    </source>

    <filter gearset.*>
      @type parser
      <parse>
        @type json
        json_parser oj
      </parse>
      replace_invalid_sequence true
      key_name message
      emit_invalid_record_to_error false
    </filter>

    <filter gearset.*>
      @type prometheus
      <metric>
        name na
        type counter
        desc NA
        <labels>
          tag ${tag}
        </labels>
      </metric>
    </filter>

Note the plugin which causes the error is the JSON plugin because it is emitting records of string type, but it is the Prometheus plugin which is throwing the error because it is the next thing handling the records (and is not designed to be able to)

Your Error Log

dump an error event: error_class=NoMethodError error="undefined method `map' for #<String:0x00007fdbb6b32308>\nDid you mean?  tap" location="/usr/local/bundle/gems/fluent-plugin-prometheus-2.0.3/lib/fluent/plugin/prometheus.rb:90:in `stringify_keys'" tag="gearset.x" time=2023-03-14 12:48:13.381801632 +0000 record="HELLO"

Additional context

No response

daipom commented 1 year ago

Thanks for reporting! We can reproduce this issue as follows.

<source>
  @type sample
  tag test.hoge
  sample {"message": "{\"k\":\"v\"}"}
</source>

<source>
  @type sample
  tag test.fuga
  sample {"message": "\"HELLO\""}
</source>

<filter test.**>
  @type parser
  key_name message
  <parse>
    @type json
  </parse>
</filter>

<filter test.**>
  @type record_transformer
  enable_ruby true
  <record>
    class ${record.class}
  </record>
</filter>

<match test.**>
  @type stdout
</match>

Output

2023-03-17 11:18:09 +0900 [warn]: #0 dump an error event: error_class=NoMethodError error="undefined method `merge!' for \"HELLO\":String" location="/home/daipom/work/fluentd/fluentd/lib/fluent/plugin/filter_record_transformer.rb:135:in `reform'" tag="test.fuga" time=2023-03-17 11:18:09.063985763 +0900 record="HELLO"
2023-03-17 11:18:09.064692451 +0900 test.hoge: {"k":"v","class":"Hash"}
2023-03-17 11:18:10 +0900 [warn]: #0 dump an error event: error_class=NoMethodError error="undefined method `merge!' for \"HELLO\":String" location="/home/daipom/work/fluentd/fluentd/lib/fluent/plugin/filter_record_transformer.rb:135:in `reform'" tag="test.fuga" time=2023-03-17 11:18:10.066951425 +0900 record="HELLO"
2023-03-17 11:18:10.067684144 +0900 test.hoge: {"k":"v","class":"Hash"}
daipom commented 1 year ago

I think this is probably a bug.

Potentially I would just expect us to check if values is a Hash on

https://github.com/fluent/fluentd/blob/d5df992d844af38233c4d3788de714d644ddb6c4/lib/fluent/plugin/filter_parser.rb#L74

But that might change behaviour in an unexpected way for people, so it might need to be a configuration value. It is this question what led me to doing an Issue rather than a PR

Thanks for your consideration on this point! I will figure out how we can fix this.

jwreford99 commented 1 year ago

Thanks for looking at this so quickly! Sounds like you can reproduce it alright, but let me know if there is anything I can do to help 😄 (also sorry, hadn't realised you could use the sample source to be able to output messages like that, would of made the config easier to show as an example!)

daipom commented 1 year ago

Sorry for the delay!

Potentially I would just expect us to check if values is a Hash on https://github.com/fluent/fluentd/blob/d5df992d844af38233c4d3788de714d644ddb6c4/lib/fluent/plugin/filter_parser.rb#L74

But that might change behaviour in an unexpected way for people, so it might need to be a configuration value. It is this question what led me to doing an Issue rather than a PR

Thanks for your consideration on this point! I will figure out how we can fix this.

This is not only an issue for filter_parser, but for all plugins that can use parser_json.

So we should fix parser_json.

https://github.com/fluent/fluentd/blob/d5df992d844af38233c4d3788de714d644ddb6c4/lib/fluent/plugin/parser_json.rb#L72-L81

This should return nil when the parsed result is not a Hash.

The root cause of this is that sometimes the Ruby library does not parse a given string as Hash.

require "json"
JSON.load('"hoge"')
=> "hoge"

I don't understand the reason for this Ruby specification. Anyway, we would have expected this to be a JSON::ParserError. The same can be said for oj and yajl.

daipom commented 1 year ago

I have fixed this in

daipom commented 1 year ago

The root cause of this is that sometimes the Ruby library does not parse a given string as Hash.

require "json"
JSON.load('"hoge"')
=> "hoge"

I don't understand the reason for this Ruby specification.

This is a more complicated issue than I initially thought...

It is also necessary to take into account that the value is parsed as an Array and that the JSON specification allows the value to be neither a Hash nor an Array.

The current implementation of JSONParser does not appear to have sufficiently considered this.

daipom commented 1 year ago

We need to consider carefully the specification of the case where raw data can not be parsed as Hash.

I think the currently intended usecases are Hash or Array of Hash.

I think we should consider only these 2 patterns as parsable for now.

daipom commented 1 year ago
<source>
  @type sample
  tag test.hash
  sample {"message": "{\"k\":\"v\"}"}
</source>

<source>
  @type sample
  tag test.string
  sample {"message": "\"HELLO\""}
</source>

<source>
  @type sample
  tag test.array
  sample {"message": "[{\"k\":\"v\"}, {\"k2\":\"v2\"}]"}
</source>

<filter test.**>
  @type parser
  key_name message
  <parse>
    @type json
  </parse>
</filter>

<filter test.**>
  @type record_transformer
  enable_ruby true
  <record>
    class ${record.class}
  </record>
</filter>

<match test.**>
  @type stdout
</match>
2023-03-21 19:12:34.005516812 +0900 test.hash: {"k":"v","class":"Hash"}
2023-03-21 19:12:35 +0900 [warn]: #0 dump an error event: error_class=NoMethodError error="undefined method `merge!' for [{\"k\"=>\"v\"}, {\"k2\"=>\"v2\"}]:Array" location="/home/daipom/work/fluentd/fluentd/lib/fluent/plugin/filter_record_transformer.rb:135:in `reform'" tag="test.array" time=2023-03-21 19:12:35.006256894 +0900 record=[{"k"=>"v"}, {"k2"=>"v2"}]
2023-03-21 19:12:35 +0900 [warn]: #0 dump an error event: error_class=NoMethodError error="undefined method `merge!' for \"HELLO\":String" location="/home/daipom/work/fluentd/fluentd/lib/fluent/plugin/filter_record_transformer.rb:135:in `reform'" tag="test.string" time=2023-03-21 19:12:35.007100208 +0900 record="HELLO"
stevehipwell commented 1 year ago

Has there been any progress on this?

daipom commented 1 year ago

This will be fixed in #4106. There are still some points that need to be done in #4106. I will fix it so that it can be included in the next or the next after the next release.

ashie commented 4 months ago

4474 partially fixes this issue but not yet complete for filter_parser.

jwreford99 commented 4 months ago

Thanks folks!

daipom commented 3 weeks ago

The remaining problem is filter_parser can't return multiple parsed results since #4474.

From #4478

The following problem of filter_parser remains in #4474 :

However, we have a remaining problem with filter_parser. filter_parser could return Array based on this wrong behavior. This PR disables it, so it can not return multiple parsed results anymore. Even though it was wrong to begin with, it's possible that this change in specifications would be unacceptable. We need to consider this change carefully.

Athishpranav2003 commented 2 weeks ago

@daipom maybe i can check this out but i am not getting the context of problem fully Correct me if i am wrong. We have a case where when multiple group of elements match the parser we create an array of hash for each match. But in current implementation Since filter emits one event we have only handled the first hash from this array?.

The ask in this problem is we need to facilitate multiple hashes being emitted ?

daipom commented 2 weeks ago

@daipom maybe i can check this out but i am not getting the context of problem fully Correct me if i am wrong. We have a case where when multiple group of elements match the parser we create an array of hash for each match. But in current implementation Since filter emits one event we have only handled the first hash from this array?.

The ask in this problem is we need to facilitate multiple hashes being emitted ?

@Athishpranav2003 Thanks!! Since this is a significant problem with Fluentd, I would be very happy if it could be fixed!

That's right! We can reproduce the problem as the following comment:

https://github.com/fluent/fluentd/pull/4478#issuecomment-2084425518

Currently, Fluentd outputs the warning message. We need to make the behavior such that this warning message is unnecessary. In other words, we need to ensure that both {"k":"v"} and {"k2":"v2"} are emitted.

This will not be just a filter_parser problem. The whole structure of the Filter plugin may need to be reviewed:

https://github.com/fluent/fluentd/blob/4a942710f99f232a385d11cf6721dddb9ac94a32/lib/fluent/plugin/filter.rb#L82-L104

We need to be careful not to affect other filter plugins or break the compatibility.

Athishpranav2003 commented 2 weeks ago

Correct me if I am wrong Currently filter structure is a function with one event input and output. Now we are trying to have one event input to multiple event output?

daipom commented 2 weeks ago

Correct me if I am wrong Currently filter structure is a function with one event input and output. Now we are trying to have one event input to multiple event output?

https://docs.fluentd.org/plugin-development/api-plugin-filter#methods

Filter plugins can have 3 methods:

Currently, A and B should receive and return one record of Hash. (Strictly, B should return a pair of a time and a record.) On the other hand, C can handle multiple records as a EventStream.

Only a few Filter plugins, such as filter_record_transformer, would implement C.

I said the following,

The whole structure of the Filter plugin may need to be reviewed:

but given this point, I am beginning to think that it would be wrong for filter_parser to be using B , and it is just a matter of the design of filter_parser.

filter_parser performs the parsing process internally, so there is a possibility that a single record becomes multiple records. Then, it seems filter_parser should implement C instead of B.

Athishpranav2003 commented 1 week ago

@daipom makes sense. I guess the change is specific to filter_parser and not filter base itself. I will try to work on it in my freetime(so many stale PRs in many repos i have :( )