Open opencmit2 opened 1 month ago
Only the byte slice [208 161 195 247] can be successfully transcoded from GBK to UTF-8.
First of all, GBK is only compatible UTF-8 encoding for ASCII part like cp932 does. So, the GBK translated logs are not compatible for UTF-8 encoding assumed mechanism for Wasm.
To create msgpack payload, we just process as-is and adding the additional metadata. To create json payload, we process them with escaping for JSON adoption.
I also confirmed that handling as msgpack is not affected for encodings. Currently, we didn'y support non UTF-8 encodings. Meanwhile, if possible, could you use mgspack format for processing your non UTF-8 payloads?
Only the byte slice [208 161 195 247] can be successfully transcoded from GBK to UTF-8.
First of all, GBK is only compatible UTF-8 encoding for ASCII part like cp932 does. So, the GBK translated logs are not compatible for UTF-8 encoding assumed mechanism for Wasm.
To create msgpack payload, we just process as-is and adding the additional metadata. To create json payload, we process them with escaping for JSON adoption.
I also confirmed that handling as msgpack is not affected for encodings. Currently, we didn'y support non UTF-8 encodings. Meanwhile, if possible, could you use mgspack format for processing your non UTF-8 payloads?
hi @cosmo0920 , would there be any plan adding encoding/decoding function to INPUT plugin so that the non-UTF-8 encoding logs could be converted in prior? Many of our applications support GB-2312 only and their log files have to be converted to UTF-8 in prior to be processing by FluentBit.
Only the byte slice [208 161 195 247] can be successfully transcoded from GBK to UTF-8.
First of all, GBK is only compatible UTF-8 encoding for ASCII part like cp932 does. So, the GBK translated logs are not compatible for UTF-8 encoding assumed mechanism for Wasm. To create msgpack payload, we just process as-is and adding the additional metadata. To create json payload, we process them with escaping for JSON adoption. I also confirmed that handling as msgpack is not affected for encodings. Currently, we didn'y support non UTF-8 encodings. Meanwhile, if possible, could you use mgspack format for processing your non UTF-8 payloads?
hi @cosmo0920 , would there be any plan adding encoding/decoding function to INPUT plugin so that the non-UTF-8 encoding logs could be converted in prior? Many of our applications support GB-2312 only and their log files have to be converted to UTF-8 in prior to be processing by FluentBit.
I'm still considering this type of encoding conversion. My encoding environment of Windows is almost using Shift-JIS(cp932). So, I'm also hitting this issue and this is one of the not highly proceeded to replace with Fluent Bit from Fluentd here. Fluentd provides convenient way to convert from non-ASCII encoding to UTF-8. This issue is now revealed that it's quite larger than we expected.
I'm still considering this type of encoding conversion. My encoding environment of Windows is almost using Shift-JIS(cp932). So, I'm also hitting this issue and this is one of the not highly proceeded to replace with Fluent Bit from Fluentd here. Fluentd provides convenient way to convert from non-ASCII encoding to UTF-8. This issue is now revealed that it's quite larger than we expected.
Thanks @cosmo0920 for the reply.
Yes, I found this function is supported in Fluentd as well and that's the reason why I asked if it is possible to migrate it here. Good to know that it is not "abandoned" yet :D
Bug Report
Describe the bug
Issue Content:
GBK Test Data Source
Fluent Bit Configuration File
Event_Format Set to JSON Handling Code and Corresponding Output
Fluent Bit Configuration File
Event_Format Set to msgpack Handling Code and Corresponding Output
When Event_Format is set to JSON, the byte slice is [208 161 238 131 131].
When Event_Format is set to MessagePack, the byte slice is [208 161 195 247].
Only the byte slice [208 161 195 247] can be successfully transcoded from GBK to UTF-8. I suspect that Fluent Bit might be performing additional processing when Event_Format is set to JSON.
Expected behavior
Screenshots
Event_Format set to JSON
Event_Format set to MessagePack
Your Environment
Additional context