Closed chikinchoi closed 4 years ago
However, I found an error "dump an error event: error_class=ArgumentError error="invalid byte sequence in UTF-8" location="/usr/lib/ruby/gems/2.5.0/gems/fluent-plugin-concat-2.4.0/lib/fluent/plugin/filter_concat.rb:291:in `match'" recently. I have added the replace_invalid_sequence but no luck. Please advise. Thank you!!
This parameter should be added in filter parser
plugin configuration not filter concat
plugin.
https://docs.fluentd.org/filter/parser#replace_invalid_sequence
replace_invalid_sequence
as true should handle invalid byte sequence in UTF8 or other encodings.
Hi @cosmo0920 ,
I understand that replace_invalid_sequence
should be added in filter parser plugin. I saw that there are some parser plugin, e.g "json", "csv", "multiline". However, I don't need to parse the data into other format in the concat filter, may I know how to add the replace_invalid_sequence
with concat filter?
Thank you.
<filter **firelens**>
@type concat
key log
multiline_start_regexp '^\{\\"@timestamp'
multiline_end_regexp '/\}/'
separator ""
flush_interval 1
timeout_label @NORMAL
</filter>
Hi @cosmo0920 ,
I think that there is a mutual exclusion in this case. I have considered the below solution to fix the "docker has split over multiple lines due to its 16KB line limit" issue and also the "invalid byte sequence in UTF-8" issue.
According to [1], I found that the event proceeds through the filter configuration in descending order. Therefore, if I place the concat filter first, it will trigger the "invalid byte sequence in UTF-8' issue as the "replace_invalid_sequence" is in the parser filter. If I place the parser filter first, it will trigger the "docker has split over multiple lines due to its 16KB line limit" issue as the "key" field in some logs is not a complete log due to split to multiple lines.
Could you please add a new feature which is to add a new parameter replace_invalid_sequence
into the concat plugin or suggest another solution to fix this mutual exclusion? Thank you very much!
<filter **firelens**>
@type concat
key log
multiline_start_regexp '^\{\\"@timestamp'
multiline_end_regexp '/\}/'
separator ""
flush_interval 1
timeout_label @NORMAL
</filter>
<filter **firelens**>
@type parser
key_name log
reserve_data true
replace_invalid_sequence true
emit_invalid_record_to_error false
<parse>
@type json
</parse>
</filter>
Could you please add a new feature which is to add a new parameter
replace_invalid_sequence
into the concat plugin or suggest another solution to fix this mutual exclusion? Thank you very much!
We won't add replace_invalid_sequance
on filter concat
plugin.
In Fluentd world, one plugin should has one functionality.
Monolithic plugin is not followed for Fluentd design concept.
Instead, how about using fluent-plugin-string-scrub to scrub invalid byte sequences?
Hi @cosmo0920 ,
Thank you for your suggestion. I added the string_scrub filter as below config and the invalid byte sequence issue is gone.
<filter **>
@type string_scrub
replace_char ?
</filter>
However, I don't really understand about this string_scrub plugin. May I know what is the usage or replace_char ?
.
Can I have some example input and the output after perform the filter? Thank you very much!!
replace_char is used in https://ruby-doc.org/core-2.4.0/String.html#method-i-scrub-21 . And invalid byte sequence issue is solved. Closing.
Problem
Hi Team,
I have applied the fluent-plugin-concat in order to join logs that docker has split over multiple lines due to its 16KB line limit. However, I found an error "dump an error event: error_class=ArgumentError error="invalid byte sequence in UTF-8" location="/usr/lib/ruby/gems/2.5.0/gems/fluent-plugin-concat-2.4.0/lib/fluent/plugin/filter_concat.rb:291:in `match'" recently. I have added the replace_invalid_sequence but no luck. Please advise. Thank you!!
Steps to replicate
I cannot reproduce the error as there are so many logs send to this fluentd. Below is my filter config in fluentd:
Your environment
fluentd' version '1.11.1 fluent-plugin-concat' version '2.4.0