darold / pgbadger

A fast PostgreSQL Log Analyzer
http://pgbadger.darold.net/
PostgreSQL License
3.55k stars 352 forks source link

Trying to understand long report generation #711

Open hevisko opened 2 years ago

hevisko commented 2 years ago

Currently I have multiple "small" 100MB sized files sent to my pgBadger "processor" where the files are processed with -I -J <cores> every hour as they arived. This is generating a bunch of ___.bin files, and then takes quite a while to generate the HTML reports it seems.

So my questions:

1) Would the HTMLreport generation be faster if the *.bin files are merged into a single file for the day/week/month? If so, is that something that is possible to achieve (Even if in a phased rocessing)

2) Would it be possible to, not just do a report per day/week/month (thanks for those) but for a specific time that we had a problem to investigate?

darold commented 2 years ago

If you want to generate a report for a specific day you just have to give the day binary files as input, for example:

pgbadger -o myreportdir/ data/2022/02/02/*.bin

you will have a report for this specific day.

Normally mutliprocess should also be used to build the report but let me check if -J is taken in account, it is possible that I have only take care of -j.

darold commented 2 years ago

Despite what I've though there is no multiprocess used for report generation, this could be an improvement.

hevisko commented 2 years ago

There is another issue that might be part of this: The memory grows - and grows during this process (In my one very busy DB's case >32GB and currently ~50GB while busy catching up) which might be begging the question how to limit RAM consumption during report generation time?

darold commented 2 years ago

You can limit the memory by reducing the top number of element stored and the max query length, for example: -t 15 -m 2048

hvisage commented 2 years ago

my trouble when I cut the statement size, is that the devilish details for this specific instance, is in the query data that gets "cut" and the problem statements are those that is in the 10-20 top elements :(=)

will have to handle the RAM spikes in this processing, was hoping for a "swap to disk" option, but I'm an outlier - yet again :D

darold commented 2 years ago

If you can send the *.bin files required to reproduce your issue to my private email I will try to see what could be improved in pgbadger otherwise I'm afraid I can't do much more for this issue.