Open MetalRex101 opened 2 years ago
Is there any specification regarding LTSV's double-quote escapes? In particular, I'm concerned about:
\
and '
?)That said, I'm personally unimpressed with this format extension, especially
because LTSV was initially pitched as being super easy & fast to parse
("hey, you can just split the string with \t
and parsing is done!").
Now it's slowly falling in the same rut of CSV, requiring a state machine to parse it properly. I have a vague feeling of history repeating itself here!
Absolutely agreed with you here. And maybe application should not write logs in LTSV format, if it makes it complicated to parse. I think only values should be quoted. If there are any other double quotes, that not represent end of value inside it should be escaped for sure and it's an application responsibility. Logs parser should not solve all the problems, but i think we can make a little improvement to enable parse value from first delimiter symbol to second. And also think on ability to replace default delimiter (for example double quotes) with other symbol sequences to add i bit more flexibility.
I found this delimiter expression
/s(?=(?:[^"]*"[^"]*")*[^"]*$)/
Even better - use the logfmt parser :)
Is your feature request related to a problem? Please describe.
LTSV plugin can't understand quoted values Example: Having log:
level=warn ts=2021-12-20T05:56:00.397096942Z caller=operator.go:516 component=alertmanageroperator msg="alertmanager key=kube-system/prometheus-operator-kube-s-alertmanager, field spec.baseImage is deprecated, 'spec.image' field should be used instead"
using ltsv plugin with next settings:
expected json:
actual result:
msg field is truncated, wrong key field is added to result json. Actual result is parsed wrong, because it doesn't understand double quotes where all msg field value enclosed.
Describe the solution you'd like
LTSV plugin understands, that value after label_delimiter could be enclosed in some symbol sequence, for example, double or single quotes. It will help to solve such cases.
Describe alternatives you've considered
It made me to use regexp parser type and put everything, that goes after ts field as message with next expression:
/level=(?<level>.*)\sts=(?<timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3,}[A-Z]+) (?<message>.*)/
which is much worse solution, that can be done with proper ltsv plugin implementation.Additional context
No response