darold / pgbadger

A fast PostgreSQL Log Analyzer
http://pgbadger.darold.net/
PostgreSQL License
3.45k stars 348 forks source link

pgBadger 12.2 - Normalization does not handle properly balanced single-quoted strings along with escaped quotes inside #804

Open bbourgier opened 9 months ago

bbourgier commented 9 months ago

Hi,

Alongside with my previous bug report I saw that some of my Windows Network Paths were showing up in the "Time consuming queries (N)" part where only normalized (with '?' as parameters) queries are supposed to show up!! So I did some debug and the "remove string content" of the "normalize_query" proc is not doing proper handling for cases where "\'' (backslash + quote) are inside and at the end of the string: my case for network paths.

So I fixed this in Pull Request #803 (sorry, double commit because I had to revise very slightly the regexp in order to properly handle cases 7 and 8 below)

I tested then the following test cases and this looks good. (although most cases are not real case as string as parameters should always end up with a quote (or double-quote))

### Test case #01:
###   call tools.dataimport(3, '\\some.\'site.com\folder\', 1000);
###   >>
###   call tools.dataimport(3, ?, 1000);
### Test case #02:
###   call tools.dataimport(3, '\'\\some.\'site.com\folder\'\\', 1000);]
###   >>
###   call tools.dataimport(3, ?, 1000);        
### Test case #03:
###   call tools.dataimport(3, XXX'\\some.\'site.\'subsite.\'com\folder1\''folder2\'YYY, 1000);
###   >>
###   call tools.dataimport(3, XXX?folder2\'YYY, 1000);
### Test case #04:
###   call tools.dataimport(3, xxx'\\some.\'site.\'subsite.\'com\folder1\''folder2'\folder3\'yyy'zzz\'www, 1000);
###   >>
###   call tools.dataimport(3, xxx?folder2?zzz\'www, 1000);
### Test case #05:
###   select * from tools.getusers('1', 'Y', '', '0', 'DOMAIN\user', '', '', '', '', '')
###   >>
###   select * from tools.getusers(?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
### Test case #06:
###   call tools.dataimport(3, "\\some.site.com\folder\", 1000);
###   >>
###   call tools.dataimport(3, ?, 1000);
### Test case #07:
###   CALL tools.dataimport('\\some.site.com\folder\','C012152');
###   >>
###   CALL tools.dataimport(?,?);
### Test case #08:
###   CALL tools.dataimport('\\some.site.com\folder\'   ,  'C012152');
###   >>
###   CALL tools.dataimport(? , ?);

And now I only see '?' in the HTML output which is good.

                <tr>
                <td>1</td>
                <td>20
                    <p><a href="#Amost_frequent_queries_details_1" class="btn btn-default btn-xs" data-toggle="collapse">Details</a></p>
                </td>
                <td>4m33s</td>
                <td>5s90ms</td>
                <td>47s939ms</td>
                <td>13s678ms</td>
                <td id="most-frequent-queries-examples-details-rank-1">
                    <div id="query-f-1" class="sql sql-mediumsize"><i class="glyphicon icon-copy" title="Click to select query"></i><span class="kw1">call</span> tools.dataimport <span class="br0">(</span>?<span class="sy0">,</span> ?<span class="sy0">,</span> ?<span class="br0">)</span>;

</div>
                        <!-- Details collapse -->

Thanks in advance.

Bertrand

bbourgier commented 8 months ago

Hi,

original 12.2 normalize_query logic does not work well with single quoted strings ending with \' for instance:

call tools.dataimport(3, '\\some.\'site.com\folder\', 1000);
CALL tools.dataimport('\\some.site.com\folder\','C012152');
CALL tools.dataimport('\\some.site.com\folder\'   ,  'C012152');

I thought my code proposal worked ok but now I still have cases where regexp does not match properly and ends up breaking the query string while replacing wrong parts with question marks (?).

For instance:

query part:
                '' || ' / (' || plrow.fromquantity || '-' || coalesce(plrow.toquantity::text, 'infinite') || ') '
            end pricingline,
            mm.name as kind,

            --'ass'as kind

will become:

                ? || ? || plrow.fromquantity || ? || coalesce(plrow.toquantity::text, ?) || ') ?ass'as kind

whereas it should be:

                ? || ? || plrow.fromquantity || ? || coalesce(plrow.toquantity::text, ?) || ?
            end pricingline,
            mm.name as kind,

            --?as kind

So in the end, I still believe something should be done to the "remove string content" part of "sub normalize_query" but my fix proposal is not foolproof.

So I keep pullout request #803 as draft at the moment.

Sorry.

Bertrand