Closed wsdookadr closed 3 years ago
If you only want to trace back some long queries that can be executed from multiple places in the client code, you could add different /* ... */
style comments to distinguish call sites. Those will be removed from the normalized queries, but would be preserved in the examples. I think the examples are the top X slow running queries, so you'll be able to track them down.
If you want to fully distinguish workload of different part of your application, that isn't something easy to do in general. You could try to dynamically change the application_name (e.g. xxx vY module Z
) and report it in your log_line_prefix, so you can filter it as you need.
you could add different
/* ... */
style comments to distinguish call sites.
Yes this was the very first thing that came to mind, back then I did it like this
Those will be removed from the normalized queries
Yup, that will make them all be counted as the same. Looking at pgbadger, it creates the pgFormatter::Beautify
object like this:
my $sql_prettified = pgFormatter::Beautify->new('colorize' => 1, 'format' => 'html', 'uc_keywords' => 0);
and no_comments
is zero by default, so it would seem like comments shouldn't be stripped out, but they are(?)
You could try to dynamically change the application_name (e.g.
xxx vY module Z
) and report it in your log_line_prefix
This is new to me and actually it sounds much cleaner. I will keep this one in mind, thanks!
and no_comments is zero by default, so it would seem like comments shouldn't be stripped out, but they are(?)
The normalization happens in the parsing, otherwise you wouldn't get correct stats:
# Normalize SQL queries by removing parameters
sub normalize_query
{
my $orig_query = shift;
return if (!$orig_query);
# Remove comments
$orig_query =~ s/\/\*(.*?)\*\///gs;
[...]
But the example queries (-t | --top number
) are kept as-is, so as long as you need the longest running ones, that should work.
@rjuju points out the right behavior of pgbadger. Stripping comments in the beautifier doesn't affect the queries normalization but just the formatted query. About stripping comments at query normalization I have pushed commit 9e120b3 that adds a new command line option --keep-comments to not remove comments from normalized queries. Claiming to be useful if you want to distinguish between same normalized queries.
That new switch is spot on, thank you!
A long time ago I remember I was looking at finding some queries that were taking a long time in a codebase written by a team. There were multiple locations in the application code that were executing the same query (the query was not static, it did have parameters).
I remember the question I was asking myself back then was how to differentiate between these multiple instances of the same query, because after the log analysis stage would be done, I wanted to be able to point back to the application code, and to say "This line in the application code has executed this amount of queries, and it took this amount of time to run them".
pgbadger back then grouped together all those queries, and all the bookkeeping that followed identified all those queries to be the same. And that's fine, except in my case I needed to differentiate between these queries, so a way to preserve the application code location of the query would be needed somehow.
Looking back on this case, I am writing to ask for recommendations and guidelines, or potential workarounds and viable approaches
Note: This is more or less a corner-case