fluent / fluentd

Fluentd: Unified Logging Layer (project under CNCF)
https://www.fluentd.org
Apache License 2.0
12.82k stars 1.34k forks source link

Max nesting level for json parser #3311

Open bazzilio opened 3 years ago

bazzilio commented 3 years ago

Is your feature request related to a problem? Please describe. I want to have option for the json parser plugin to limit nesting level for the parsing. My developers send huge metadata json, after parsing it "eats" elasticsearch fields.

curl --progress-bar "http://127.0.0.1:9200/index-logs1/_field_caps?fields=*" | jq '.fields'  | grep -E '^  "' 2>/dev/null | grep metadata | awk -F. '{print NF}' | sort -n | wc -l 
733

curl --progress-bar "http://127.0.0.1:9200/index-logs1/_field_caps?fields=*" | jq '.fields'  | grep -E '^  "' 2>/dev/null | grep context | awk -F. '{print NF}' | sort -n | wc -l 
218

But if i could limit nesting level for parsing, it would dramatically decreased fields count:

curl --progress-bar "http://127.0.0.1:9200/index-logs1/_field_caps?fields=*" | jq '.fields'  | grep -E '^  "' 2>/dev/null | grep metadata | awk -F. 'NF>5 {print NF}' | sort -n | wc -l 
101

curl --progress-bar "http://127.0.0.1:9200/index-logs1/_field_caps?fields=*" | jq '.fields'  | grep -E '^  "' 2>/dev/null | grep context | awk -F. 'NF>5 {print NF}' | sort -n | wc -l 
25

Describe the solution you'd like Set parameter to json parser section - max_nesting(int) So the parser would leave unparsed json after the nesting is reacher.

Describe alternatives you've considered

Additional context As i can see, parameter support with main json ruby libraries:

bazzilio commented 3 years ago

One more question: is there a way to change DEFAULT_OJ_OPTIONS variable ? If i correct understang login in sources - looks like oj is the default parser. But as i see, for parse_io method fluentd uses yajl, so i am confused - which parser is using by default.

ashie commented 3 years ago

is there a way to change DEFAULT_OJ_OPTIONS variable ?

It seems there is no way to do it (pull request is welcome :smile:)

If i correct understang login in sources - looks like oj is the default parser. But as i see, for parse_io method fluentd uses yajl, so i am confused - which parser is using by default.

It seems that oj is optional, it ensures to use oj if it's available but not required mandatory. On the other hand yajl is madatory required. If oj isn't installed, fall back to yajl.

https://github.com/fluent/fluentd/blob/6a2852ab9ac1158ee1982220f77b967b3ede82c1/fluentd.gemspec#L23 https://github.com/fluent/fluentd/blob/6a2852ab9ac1158ee1982220f77b967b3ede82c1/fluentd.gemspec#L52 https://github.com/fluent/fluentd/blob/6a2852ab9ac1158ee1982220f77b967b3ede82c1/lib/fluent/plugin/parser_json.rb#L61-L71

In addition, there is the following description about yajl in the document of this plugin:

yajl: Mainly for stream parsing
ashie commented 3 years ago

It seems that oj is optional, it ensures to use oj if it's installed but not required mandatory. On the other hand yajl is madatory required. If oj isn't installed, fall back to yajl.

However, it surely confusing. Because it's not documented, users can't understand such behavior. We should update the document: https://github.com/fluent/fluentd-docs-gitbook/blob/1.0/parser/json.md

ashie commented 3 years ago

We should update the document: https://github.com/fluent/fluentd-docs-gitbook/blob/1.0/parser/json.md

https://github.com/fluent/fluentd-docs-gitbook/pull/298

ashie commented 3 years ago

Fixed by #3315 You can use FLUENT_OJ_OPTION_MAX_NESTING for it.

ashie commented 2 years ago

Now I've noticed that Oj.default_options doesn't accept :max_nesting: https://www.rubydoc.info/github/ohler55/oj/Oj.default_options

It's reported at https://app.slack.com/client/T0CSKNZLK/C0CTT63EE/thread/C0CTT63EE-1631532462.067500

We should consider other way to apply it.

vishalmamidi1 commented 2 years ago

Does FLUENT_OJ_OPTION_MAX_NESTING still doesn't work?

ashie commented 2 years ago

Does FLUENT_OJ_OPTION_MAX_NESTING still doesn't work?

Yes, it doesn't work. Because now I notice that Oj.default_options doesn't support it, I'll remove it. Instead, I'm considering to add max_nesting parameter to parser_json.

ashie commented 1 year ago

The implementation of Oj:

max_nesting isn't supported by Oj.default_options.