Open jwreford99 opened 1 year ago
Thanks for reporting! We can reproduce this issue as follows.
<source>
@type sample
tag test.hoge
sample {"message": "{\"k\":\"v\"}"}
</source>
<source>
@type sample
tag test.fuga
sample {"message": "\"HELLO\""}
</source>
<filter test.**>
@type parser
key_name message
<parse>
@type json
</parse>
</filter>
<filter test.**>
@type record_transformer
enable_ruby true
<record>
class ${record.class}
</record>
</filter>
<match test.**>
@type stdout
</match>
Output
2023-03-17 11:18:09 +0900 [warn]: #0 dump an error event: error_class=NoMethodError error="undefined method `merge!' for \"HELLO\":String" location="/home/daipom/work/fluentd/fluentd/lib/fluent/plugin/filter_record_transformer.rb:135:in `reform'" tag="test.fuga" time=2023-03-17 11:18:09.063985763 +0900 record="HELLO"
2023-03-17 11:18:09.064692451 +0900 test.hoge: {"k":"v","class":"Hash"}
2023-03-17 11:18:10 +0900 [warn]: #0 dump an error event: error_class=NoMethodError error="undefined method `merge!' for \"HELLO\":String" location="/home/daipom/work/fluentd/fluentd/lib/fluent/plugin/filter_record_transformer.rb:135:in `reform'" tag="test.fuga" time=2023-03-17 11:18:10.066951425 +0900 record="HELLO"
2023-03-17 11:18:10.067684144 +0900 test.hoge: {"k":"v","class":"Hash"}
I think this is probably a bug.
Potentially I would just expect us to check if
values
is aHash
onBut that might change behaviour in an unexpected way for people, so it might need to be a configuration value. It is this question what led me to doing an Issue rather than a PR
Thanks for your consideration on this point! I will figure out how we can fix this.
Thanks for looking at this so quickly! Sounds like you can reproduce it alright, but let me know if there is anything I can do to help 😄 (also sorry, hadn't realised you could use the sample
source to be able to output messages like that, would of made the config easier to show as an example!)
Sorry for the delay!
Potentially I would just expect us to check if
values
is aHash
on https://github.com/fluent/fluentd/blob/d5df992d844af38233c4d3788de714d644ddb6c4/lib/fluent/plugin/filter_parser.rb#L74But that might change behaviour in an unexpected way for people, so it might need to be a configuration value. It is this question what led me to doing an Issue rather than a PR
Thanks for your consideration on this point! I will figure out how we can fix this.
This is not only an issue for filter_parser
, but for all plugins that can use parser_json
.
So we should fix parser_json
.
This should return nil
when the parsed result is not a Hash
.
The root cause of this is that sometimes the Ruby library does not parse a given string as Hash
.
require "json"
JSON.load('"hoge"')
=> "hoge"
I don't understand the reason for this Ruby specification.
Anyway, we would have expected this to be a JSON::ParserError
.
The same can be said for oj
and yajl
.
I have fixed this in
The root cause of this is that sometimes the Ruby library does not parse a given string as
Hash
.require "json" JSON.load('"hoge"') => "hoge"
I don't understand the reason for this Ruby specification.
This is a more complicated issue than I initially thought...
It is also necessary to take into account that the value is parsed as an Array
and that the JSON specification allows the value to be neither a Hash
nor an Array
.
The current implementation of JSONParser
does not appear to have sufficiently considered this.
We need to consider carefully the specification of the case where raw data can not be parsed as Hash
.
I think the currently intended usecases are Hash
or Array
of Hash
.
I think we should consider only these 2 patterns as parsable for now.
<source>
@type sample
tag test.hash
sample {"message": "{\"k\":\"v\"}"}
</source>
<source>
@type sample
tag test.string
sample {"message": "\"HELLO\""}
</source>
<source>
@type sample
tag test.array
sample {"message": "[{\"k\":\"v\"}, {\"k2\":\"v2\"}]"}
</source>
<filter test.**>
@type parser
key_name message
<parse>
@type json
</parse>
</filter>
<filter test.**>
@type record_transformer
enable_ruby true
<record>
class ${record.class}
</record>
</filter>
<match test.**>
@type stdout
</match>
2023-03-21 19:12:34.005516812 +0900 test.hash: {"k":"v","class":"Hash"}
2023-03-21 19:12:35 +0900 [warn]: #0 dump an error event: error_class=NoMethodError error="undefined method `merge!' for [{\"k\"=>\"v\"}, {\"k2\"=>\"v2\"}]:Array" location="/home/daipom/work/fluentd/fluentd/lib/fluent/plugin/filter_record_transformer.rb:135:in `reform'" tag="test.array" time=2023-03-21 19:12:35.006256894 +0900 record=[{"k"=>"v"}, {"k2"=>"v2"}]
2023-03-21 19:12:35 +0900 [warn]: #0 dump an error event: error_class=NoMethodError error="undefined method `merge!' for \"HELLO\":String" location="/home/daipom/work/fluentd/fluentd/lib/fluent/plugin/filter_record_transformer.rb:135:in `reform'" tag="test.string" time=2023-03-21 19:12:35.007100208 +0900 record="HELLO"
Has there been any progress on this?
This will be fixed in #4106. There are still some points that need to be done in #4106. I will fix it so that it can be included in the next or the next after the next release.
filter_parser
.Thanks folks!
The remaining problem is filter_parser
can't return multiple parsed results since #4474.
From #4478
The following problem of
filter_parser
remains in #4474 :However, we have a remaining problem with filter_parser. filter_parser could return Array based on this wrong behavior. This PR disables it, so it can not return multiple parsed results anymore. Even though it was wrong to begin with, it's possible that this change in specifications would be unacceptable. We need to consider this change carefully.
@daipom maybe i can check this out but i am not getting the context of problem fully Correct me if i am wrong. We have a case where when multiple group of elements match the parser we create an array of hash for each match. But in current implementation Since filter emits one event we have only handled the first hash from this array?.
The ask in this problem is we need to facilitate multiple hashes being emitted ?
@daipom maybe i can check this out but i am not getting the context of problem fully Correct me if i am wrong. We have a case where when multiple group of elements match the parser we create an array of hash for each match. But in current implementation Since filter emits one event we have only handled the first hash from this array?.
The ask in this problem is we need to facilitate multiple hashes being emitted ?
@Athishpranav2003 Thanks!! Since this is a significant problem with Fluentd, I would be very happy if it could be fixed!
That's right! We can reproduce the problem as the following comment:
https://github.com/fluent/fluentd/pull/4478#issuecomment-2084425518
Currently, Fluentd outputs the warning message.
We need to make the behavior such that this warning message is unnecessary.
In other words, we need to ensure that both {"k":"v"}
and {"k2":"v2"}
are emitted.
This will not be just a filter_parser
problem.
The whole structure of the Filter plugin may need to be reviewed:
We need to be careful not to affect other filter plugins or break the compatibility.
Correct me if I am wrong Currently filter structure is a function with one event input and output. Now we are trying to have one event input to multiple event output?
Correct me if I am wrong Currently filter structure is a function with one event input and output. Now we are trying to have one event input to multiple event output?
https://docs.fluentd.org/plugin-development/api-plugin-filter#methods
Filter plugins can have 3 methods:
#filter(tag, time, record)
#filter_with_time(tag, time, record)
#filter_stream(tag, es)
Currently, A and B should receive and return one record of Hash.
(Strictly, B should return a pair of a time and a record.)
On the other hand, C can handle multiple records as a EventStream
.
Only a few Filter plugins, such as filter_record_transformer
, would implement C.
I said the following,
The whole structure of the Filter plugin may need to be reviewed:
but given this point, I am beginning to think that it would be wrong for filter_parser
to be using B , and it is just a matter of the design of filter_parser
.
filter_parser
performs the parsing process internally, so there is a possibility that a single record becomes multiple records.
Then, it seems filter_parser
should implement C instead of B.
@daipom makes sense. I guess the change is specific to filter_parser
and not filter base itself. I will try to work on it in my freetime(so many stale PRs in many repos i have :( )
Describe the bug
The JSON parser plugin can sometimes emit records as a string rather than as a
Hash
. Similar to https://github.com/fluent/fluentd/issues/1279, the expectation of plugins further down the line is that the record it receives is always aHash
, but this contract is broken in some circumstances by the JSON plugin leading to some very difficult to diagnose bugs.This happens because a string is considered valid JSON (SO) and the JSON parser (Oj at least) treats it as such and parses it successfully. But the thing it outputs from it's parsing is not a
Hash
type, it is a string, and at no point in the JSON parser does it throw if the record it is returning isn't aHash
object.To Reproduce
If you setup a fluentd configmap which a super simple JSON filter setup:
When presented with a record which has\nDid you mean? tap` which was rather hard to debug).
{"message": "\"HELLO\""}
it will parse it so that the resulting record is"HELLO"
(i.e. not aHash
type, but just a string. This then causes errors in plugins further down the line (for example in my example this was followed by the Prometheus plugin which threw an errorundefined method
map' for #Expected behavior
I would of expected the parser plugin to call error out if the thing it was parsing is a string (i.e. the same behaviour you see if the parser failed to parse the JSON). Potentially I would just expect us to check if
values
is aHash
on https://github.com/fluent/fluentd/blob/d5df992d844af38233c4d3788de714d644ddb6c4/lib/fluent/plugin/filter_parser.rb#L74But that might change behaviour in an unexpected way for people, so it might need to be a configuration value. It is this question what led me to doing an Issue rather than a PR
Your Environment
Your Configuration
Note the plugin which causes the error is the JSON plugin because it is emitting records of string type, but it is the Prometheus plugin which is throwing the error because it is the next thing handling the records (and is not designed to be able to)
Your Error Log
Additional context
No response