Closed davidmccormick closed 5 years ago
Hmm looking at the code and adding some more logging, it looks like it is the logging function that is truncating the line and not the actual regex pattern itself. So the issue looks to be that rubular.com is happy with the Regex but fluent-bit is not.
More testing has shown it is the dashes in some of the capture group names which fluent-bit does not like!
For this bug, is there any solution to fix it. I found that cannot collect nginx log for below pattern
^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
@davidmccormick Just wow ! removing dashes ( - ) from my field names in regex solved my problems.
for example make
^(?<remote>[^ ]*) (?<host-name>[^ ]*)
to ^(?<remote>[^ ]*) (?<hostName>[^ ]*)
just remove dashes ( - ). That's it. people like @davidmccormick make this community awesome. Thanks a lot !
I found that _
and @
also cause the same issue.
I’m glad I found this! It saved me a lot of trouble. (edit: though I see a warning is listed at the bottom of https://docs.fluentbit.io/manual/pipeline/parsers/regular-expression )
Bug Report
Describe the bug I'm getting and error message loading as parser with a long regular expression: -
It looks as though the original expression has been truncated, causing the syntax error. When I shorten the regex then the compile error goes away (unless I have just removed the bit it does not like)
To Reproduce
Using a config: -
With the parser: -
With example data in /logs/tomcat_access.log
Expected behavior Regex will compile and log elements get correctly parsed into their constituent fields.
Additional context We are using fluentbit for collecting kubernetes logs, systemd, container logs and as a side-car for applications to capture file based logs. We forward all of the logs to a fluent-bit instance running as a forwarder with the splunk output plugin.
This issue affects the collection of logs for our tomcat applications - because we have a lot of them with different format logs we want a flexible regex that is able to account for these differences.