Open arunz87 opened 2 years ago
So if I understand correctly, your input logs are lines that have a series of space delimited fields:
field1 field2 field3 field4 field5 field6 field7
Right now, you are parsing this log value which means you get a json log like:
{
"field1": val,
"field2": val,
}
However, then you remove some of the keys. (Why? Why not just take all the data to CW?)
And you want the output in CW to not be json, but to just the selected original fields in a line again?
field1 field2 field3 field4
This is what you want right? And so you came to the log_key
option since it takes a JSON log and just sends a string value.
So in this case... we don't support what you want in the CW plugin... I also kind of feel like may be this is a more generic FB use case. You want to 'un-parse' you logs and take them from json back to just a string. Which FB doesn't support.
Oh wait... I just realized, the Kinesis and Firehose go plugins do have this feature, they call it data_keys
. https://github.com/aws/amazon-kinesis-streams-for-fluent-bit
Which is implemented with this code:
So the easiest solution here would be to add the data_keys feature to the CloudWatch Go plugin: https://github.com/aws/amazon-cloudwatch-logs-for-fluent-bit
This is a kind of a niche request... given the other things I have on my plate I can't prioritize it right now. However, it would be very easy for you to build this feature yourself in the go plugin and send us a PR. I recommend doing that.
https://github.com/aws/aws-for-fluent-bit#developing-features-in-the-aws-plugins
Thanks @PettitWesley for your prompt response.
data_keys
is close to what I wanted but not quite what I was looking for. I don't want the keys and values to be sent to Kinesis (or Cloudwatch in my use case), rather just the "values" so that I get a space delimited fields as output.
The reason why I cannot send the whole data to CW is due to the sheer size of the log line that exceeds the allowable limit (256kb) in CW. So the line gets truncated by CW and I need to trim it prior to analyzing it further.
@arunz87 I see/makes sense.
So we have two options here:
Describe the question/issue
@PettitWesley - I have a use case wherein my python code that expects a custom log format. Code worked fine until the record size in CloudWatch started hitting the hardlimit of 256kb event record size. This led me on a quest to reduce the record size by trimming those fields which are unnecessarily large and not required for analysis. After some reading, I have this neat little configuration that does it, except that cloudwatch_logs plugin doesn't seem to offer a way to send the fields in the expected format. Earlier, the log_key was taking the default 'log' value which worked well. Now when I trimmed the fields using a parser, I am unable to regenerate the log format in Cloudwatch. Is there a way to workaround this ?
Configuration #########parser.conf############ [PARSER] Name myparser Format regex Regex ^(?[^ ]) (?[^ ] ) (?[^ ]) (?[^ ] ) (?(")) (?[^"]*) (?("))
PS: field5 and field7 correspond to quotes which are later removed in the filter
#########fluentbit.conf########## [SERVICE] Flush 5 Daemon off Parsers_file parser.conf Log_Level debug [INPUT] Name tail Tag foo Path /var/tmp/foo.log Path_Key filename Skip_Long_Lines off [FILTER] Name parser Match foo Key_Name log Parser myparser [FILTER] Name modify Match foo Remove field5 Remove field7 Set field6 "" [OUTPUT] Name cloudwatch_logs Match foo region us-east-1 log_group_name my_log_group log_stream_name my_log_stream log_key field1 field2 field3 field4 field6 log_format json/emf
Fluent Bit Log Output
The error observed is: [2022/02/14 05:30:41] [error] [output:cloudwatch_logs:cloudwatch_logs.1] Could not find log_key 'field1 field2 field3 field4 field5' in record
Fluent Bit Version Info
Which AWS for Fluent Bit Versions have you tried?* 2.10.1
Steps to reproduce issue