darold / pgbadger

A fast PostgreSQL Log Analyzer
http://pgbadger.darold.net/
PostgreSQL License
3.49k stars 349 forks source link

'--retention' parameter - 'outdir' folder tree is cleaned up BUT not 'html-outdir' folder tree #776

Closed bbourgier closed 1 year ago

bbourgier commented 1 year ago
          Re-

In fact I think I understand. In many places in the pgBadger perl code, the HTML generated files' destination folder is specified as:

my $dest_dir = $html_outdir || $outdir;

which basically means: if html-outdir is specified in params then it takes this folder as destination for HTML files and if it is not defined then it takes outdir.

The problem (and definitely a BUG) with the '--retention' folder/file deletion code is that it does not handle BOTH outdir and html-outdir. The code only manages outdir.

In your case (your command line) you only specify outdir and NOT html-outdir. This means your HTML files are in the outdir folder tree as well as the binary files. And this is why: 1/ your HTML files gets deleted 2/ and because you specify '--noclean' then the binary files do not get deleted --> this means that in your case, simply removing the '--noclean' param will solve your issue: you won't need the extra schedule task to delete the .bin files AND the HTML files gets deleted

In my case (with outdir AND html-outdir params) the current code is missing the part that processes the html-outdir folder tree exactly the same way as the outdir folder tree. I'll declare a bug.

Hope this explanation helps. And hopefully @darold will confirm this.

Regards.

Originally posted by @bbourgier in https://github.com/darold/pgbadger/issues/749#issuecomment-1528084370

bbourgier commented 1 year ago

Hi, Just to tell you I have made a simple test.

I have simply duplicated this part:

# Clear storage when a retention is specified in incremental mode
if ( $outdir && $retention && ($saved_last_line{datetime} || $pgb_saved_last_line{datetime}) )
{
...
}

and replaced all the $outdir occurences by $html_outdir

This works well at cleaning the HTML reports part. EXCEPT ONE THING It does not clean-up Monthly Reports. This is the only thing slightly different to add to the $html_outdir clean-up code.

See screenshot after adding the code: capture_20230429_103954_01

Regards.

darold commented 1 year ago

Good catch, commit 0e5c7d5 should solve this issue. Please give it a try and let me know.

bbourgier commented 1 year ago

Good catch, commit 0e5c7d5 should solve this issue. Please give it a try and let me know.

Hi,

1/ The fix you did changes the clean-up code to clean the $html_outdir folder tree if it is specified in the command line INSTEAD OF the $outdir.

At the moment, with your fix, the html-outdir is cleaned-up BUT not the outdir anymore.

From what I understand: The '--retention' parameter purpose is to trigger clean-up on HTML files (html-outdir) AND Binary files (outdir) EXCEPT if the --noclean param is specified in the command line

Am I wrong?

This is why I was inferring that the code shall do the clean-up on BOTH folder trees (outdir AND html-outdir).

2/ And an additional issue I was mentionning in my comment: The code you posted does the clean-up on the HTML folder tree (html-outdir) BUT I still have the MONTHLY REPORT files remaining (not deleted) and the year/month folder where they stand are not deleted either. For instance: capture_20230430_152814_01 I still have: 2022 -- 10 -- index.html -- 11 -- index.html -- 12 -- index.html whereas they should have been deleted as well.

Regards

darold commented 1 year ago

For 1/ you are right, commit a1bb778 will process both output directories.

About 2/ yes, this is a separate fix that I will be done later today.

bbourgier commented 1 year ago

Hi @darold

Starting from commit a1bb778 I have modified the cleanup code so that it works well on outdir and html-outdir AND it is simplified AND it works for monthly reports.

The logic I applied is similar to the one before EXCEPT for the case where there is NO 'noclean' and NO 'retention' flags specified. In this case, the current code was doing BIN cleanup for the previous month and older, even if it was run on the 1st of the month. I have put in place a more coherent strategy (as far as I believe it is): do BIN cleanup by simulating a retention period of 5 weeks, which is about 1 month sliding window.

So the sum-up of the strategy is:

###
### Explanation:
###
### Logic for the BINARY storage ($outdir) SHOULD be:
### If ('noclean')
###   do nothing (keep everything)
### If (NO 'noclean') and (NO 'retention'):
###   use an arbitrary retention duration of: 5 weeks
###   remove BINARY files older than LAST_PARSED_MONTH-retention
###   DO NOT CHECK for HTML file existence as OLD HTML files may be deleted by external tools
###     which may lead to BINARY files NEVER deleted
### If (NO 'noclean') and ('retention'):
###   remove BINARY files older than LAST_PARSED_MONTH-retention
###   DO NOT CHECK for HTML file existence as OLD HTML files may be deleted by external tools
###     which may lead to BINARY files NEVER deleted
###
### Logic for the HTML storage ($html_outdir || $outdir) SHOULD be:
### DO NOT check 'noclean' as this flag is dedicated to BINARY files
### If (NO 'retention'):
###   do nothing (keep everything)
### If ('retention'):
###   remove HTML folders/files older than LAST_PARSED_MONTH-retention (= y/m/d):
###   days older than d in y/m/d
###   months older than m in y/m/d
###   years older than y in y/m/d
###

And I have done extensive tests on the following cases: (NO 'noclean') and (NO 'retention') and (outdir+html-outdir) (NO 'noclean') and (NO 'retention') and (outdir only) (NO 'noclean') and ('retention') and (outdir+html-outdir) (NO 'noclean') and ('retention') and (outdir only) ('noclean') and (NO 'retention') and (outdir+html-outdir) ('noclean') and (NO 'retention') and (outdir only) ('noclean') and ('retention') and (outdir+html-outdir) ('noclean') and ('retention') and (outdir only)

ALL the tests passed OK

I can provide you with command lines and debug outputs.

So my question now is: how can I commit (or provide you) with my changes? (again this is a modification of: a1bb778 and the modifications are limited to the cleanup code)

Thks in advance.