Open jcarnu opened 3 years ago
Thanks @jcarnu for the test to reproduce. Actually I would prefer to force the output to utf8 using:
use utf8;
binmode STDIN, ':utf8';
binmode STDOUT, ':utf8';
binmode STDERR, ':utf8';
the test reports the right characters whatever is the locale:
$ echo "2021-06-08 13:51:04.966 UTC [22436-2] postgres@postgres LOG: duration: 5025.921 ms statement: select ' € € ', pg_sleep(5);"|perl pgbadger -x json -f stderr -p '%m [%p-%l] %q%u@%d ' -o /tmp/test.json - && cat /tmp/test.json | jq . |grep sleep
[+] Parsed 125 bytes, queries: 1, events: 0
LOG: Ok, generating json report...
"select ' € € ', pg_sleep(5);",
"select ?, pg_sleep(?);": {
"query": "select ' € € ', pg_sleep(5);",
The problem with use open pragma
is that is don't cover all cases. For example set your locale as follow:
$ export LANG=C
$ locale
LANG=C
LANGUAGE=
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT="C"
LC_IDENTIFICATION="C"
LC_ALL=
Then with your patch execute the test and you will have this output:
$ echo "2021-06-08 13:51:04.966 UTC [22436-2] postgres@postgres LOG: duration: 5025test.json - && cat /tmp/test.json | jq . |grep sleeprl pgbadger -x json -f stderr -p '%m [%p-%l] %q%u@%d ' -o /tmp/t
[+] Parsed 147 bytes, queries: 1, events: 0
LOG: Ok, generating json report...
"select ' \\xE2\\x82\\xAC \\xE2\\x82\\xAC ', pg_sleep(5);",
now replace the pragma with :utf8 instead of :localeusing use open ':locale';
, you will now have the following output:
$ echo "2021-06-08 13:51:04.966 UTC [22436-2] postgres@postgres LOG: duration: 5025test.json - && cat /tmp/test.json | jq . |grep sleeprl pgbadger -x json -f stderr -p '%m [%p-%l] %q%u@%d ' -o /tmp/t
[+] Parsed 129 bytes, queries: 1, events: 0
LOG: Ok, generating json report...
"select ?, pg_sleep(?);": {
"query": "select ' ⬠⬠', pg_sleep(5);"
If you use my patch above that use binmod on all standard I/O , you will have the right output:
$ echo "2021-06-08 13:51:04.966 UTC [22436-2] postgres@postgres LOG: duration: 5025test.json - && cat /tmp/test.json | jq . |grep sleeprl pgbadger -x json -f stderr -p '%m [%p-%l] %q%u@%d ' -o /tmp/t
[+] Parsed 125 bytes, queries: 1, events: 0
LOG: Ok, generating json report...
"select ?, pg_sleep(?);": {
"query": "select ' € € ', pg_sleep(5);",
"select ' € € ', pg_sleep(5);",
I have noted the Illegal division by zero issue on tsung output. I can fix that but I'm not sure that the tsung output is used and maybe it can be removed to simplify the code.
Hello Gilles,
Thanks for your reply.
My current follow-up is only to improve or broaden the discussion on input/output encodings.
I agree use utf8;
may be a good choice. It may handle a large number of use-cases.
I do not have your "users' experience feedback/background" so I was wondering on how things should work in case of "queries with others encoding not handled by UTF8" (Asian languages in particular).
My point was to say : if we know there's a specific locale into logged queries, there should be a «nice» way to handle them through the :locale
thing. The main point of :locale
thing is to use the "encoding" part of it. But I agree this should be a side use-case.
OTOH we could think of a feature to force input encoding (not locale) like a specific switch...
Thanks.
If logs contains UTF-8 characters JSON output embeds "badly converted" characters :
This patch produces a correct query output with the given locale (locale should be adapted to input file locale) :
I didn't made extended tests on other output formats though it would be useful to be sure there won't be any regression.
Here are the simple tests I've made :
and outputs
It's quite basic (rough) but it shows up "Wide Character in print" warnings :
I'd like to get rid of such warning messages though could you give me more kind guidelines to finish this PR without warning (your thoughts). If not possible I'll try to find a correct way by myself.
Please take note of tsung output with illegal division by zero on line
where
$total
seems to be zero. Others output treats it like this(+$total||1)
If this fix is correct, I'll include it in another PR.
Thanks for creating pgbadger, it's an everyday friend for me !