darold / pgbadger

A fast PostgreSQL Log Analyzer
http://pgbadger.darold.net/
PostgreSQL License
3.51k stars 349 forks source link

incremental mode with ssh does not work until log file rotates #723

Closed fluca1978 closed 2 years ago

fluca1978 commented 2 years ago

Hi all, I'm observing a strange, at least to me, problem with pgbadger 11.7 and SSH connection.

$ pgbadger ssh://luca@miguel//var/log/postgresql/* -I -O /var/www/html/pgbadger/miguel/ -X --start-monday -v
DEBUG: Looking for remote filename using command: ssh luca@miguel "ls '/var/log/postgresql/'postgresql-*.log"
DEBUG: pgBadger version 11.7.
DEBUG: Output 'html' reports will be written to out.html
DEBUG: Found log offset 252798162 in file /var/www/html/pgbadger/miguel//LAST_PARSED
DEBUG: Looking for file size using command: ssh luca@miguel "ls -l '/var/log/postgresql/postgresql-Fri.log'" | awk '{print $5}'
DEBUG: Remote file size: 243912552
DEBUG: Looking for file size using command: ssh luca@miguel "ls -l '/var/log/postgresql/postgresql-Mon.log'" | awk '{print $5}'
DEBUG: Remote file size: 286059956
DEBUG: Looking for file size using command: ssh luca@miguel "ls -l '/var/log/postgresql/postgresql-Sat.log'" | awk '{print $5}'
DEBUG: Remote file size: 138413444
DEBUG: Looking for file size using command: ssh luca@miguel "ls -l '/var/log/postgresql/postgresql-Sun.log'" | awk '{print $5}'
DEBUG: Remote file size: 117596970
DEBUG: Looking for file size using command: ssh luca@miguel "ls -l '/var/log/postgresql/postgresql-Thu.log'" | awk '{print $5}'
DEBUG: Remote file size: 235139815
DEBUG: Looking for file size using command: ssh luca@miguel "ls -l '/var/log/postgresql/postgresql-Thu_old.log'" | awk '{print $5}'
DEBUG: Remote file size: 152461209
DEBUG: Looking for file size using command: ssh luca@miguel "ls -l '/var/log/postgresql/postgresql-Tue.log'" | awk '{print $5}'
DEBUG: Remote file size: 294148883
DEBUG: Looking for file size using command: ssh luca@miguel "ls -l '/var/log/postgresql/postgresql-Wed.log'" | awk '{print $5}'
DEBUG: Remote file size: 264510327
DEBUG: Retrieving log entries using command: ssh luca@miguel "cat '/var/log/postgresql/postgresql-Fri.log'" |
DEBUG: Autodetected log format 'default' from ssh://luca@miguel//var/log/postgresql/postgresql-Fri.log
DEBUG: Retrieving log entries using command: ssh luca@miguel "cat '/var/log/postgresql/postgresql-Fri.log'" |
DEBUG: this file has already been parsed: ssh://luca@miguel//var/log/postgresql/postgresql-Fri.log, there no new lines in this file
DEBUG: Retrieving log entries using command: ssh luca@miguel "cat '/var/log/postgresql/postgresql-Mon.log'" |
DEBUG: Autodetected log format 'default' from ssh://luca@miguel//var/log/postgresql/postgresql-Mon.log
DEBUG: Retrieving log entries using command: ssh luca@miguel "cat '/var/log/postgresql/postgresql-Mon.log'" |
DEBUG: this file has already been parsed: ssh://luca@miguel//var/log/postgresql/postgresql-Mon.log, there no new lines in this file
DEBUG: Retrieving log entries using command: ssh luca@miguel "cat '/var/log/postgresql/postgresql-Sat.log'" |
DEBUG: Autodetected log format 'default' from ssh://luca@miguel//var/log/postgresql/postgresql-Sat.log
DEBUG: Retrieving log entries using command: ssh luca@miguel "cat '/var/log/postgresql/postgresql-Sat.log'" |
DEBUG: this file has already been parsed: ssh://luca@miguel//var/log/postgresql/postgresql-Sat.log, there no new lines in this file
DEBUG: Retrieving log entries using command: ssh luca@miguel "cat '/var/log/postgresql/postgresql-Sun.log'" |
DEBUG: Autodetected log format 'default' from ssh://luca@miguel//var/log/postgresql/postgresql-Sun.log
DEBUG: Retrieving log entries using command: ssh luca@miguel "cat '/var/log/postgresql/postgresql-Sun.log'" |
DEBUG: this file has already been parsed: ssh://luca@miguel//var/log/postgresql/postgresql-Sun.log, there no new lines in this file
DEBUG: Retrieving log entries using command: ssh luca@miguel "cat '/var/log/postgresql/postgresql-Thu.log'" |
DEBUG: Autodetected log format 'default' from ssh://luca@miguel//var/log/postgresql/postgresql-Thu.log
DEBUG: Retrieving log entries using command: ssh luca@miguel "cat '/var/log/postgresql/postgresql-Thu.log'" |
DEBUG: this file has already been parsed: ssh://luca@miguel//var/log/postgresql/postgresql-Thu.log, there no new lines in this file
DEBUG: Retrieving log entries using command: ssh luca@miguel "cat '/var/log/postgresql/postgresql-Thu_old.log'" |
DEBUG: Autodetected log format 'default' from ssh://luca@miguel//var/log/postgresql/postgresql-Thu_old.log
DEBUG: Retrieving log entries using command: ssh luca@miguel "cat '/var/log/postgresql/postgresql-Thu_old.log'" |
DEBUG: this file has already been parsed: ssh://luca@miguel//var/log/postgresql/postgresql-Thu_old.log, there no new lines in this file
DEBUG: Retrieving log entries using command: ssh luca@miguel "cat '/var/log/postgresql/postgresql-Tue.log'" |
DEBUG: Autodetected log format 'default' from ssh://luca@miguel//var/log/postgresql/postgresql-Tue.log
DEBUG: Retrieving log entries using command: ssh luca@miguel "cat '/var/log/postgresql/postgresql-Tue.log'" |
DEBUG: this file has already been parsed: ssh://luca@miguel//var/log/postgresql/postgresql-Tue.log, there no new lines in this file
DEBUG: Retrieving log entries using command: ssh luca@miguel "cat '/var/log/postgresql/postgresql-Wed.log'" |
DEBUG: Autodetected log format 'default' from ssh://luca@miguel//var/log/postgresql/postgresql-Wed.log
DEBUG: Retrieving log entries using command: ssh luca@miguel "cat '/var/log/postgresql/postgresql-Wed.log'" |
DEBUG: this file has already been parsed: ssh://luca@miguel//var/log/postgresql/postgresql-Wed.log, there no new lines in this file
DEBUG: Starting progressbar writer process
DEBUG: Reading temporary last parsed line from /tmp/tmp_LAST_PARSED
DEBUG: the log statistics gathering took: 5 wallclock secs ( 0.02 usr  0.02 sys +  0.29 cusr  0.13 csys =  0.46 CPU)
DEBUG: no new entries in your log(s) since last run.
DEBUG: building reports took:  0 wallclock secs ( 0.00 usr +  0.00 sys =  0.00 CPU)
DEBUG: the total execution time took:  5 wallclock secs ( 0.02 usr  0.02 sys +  0.29 cusr  0.13 csys =  0.46 CPU)

if I keep executing the pgbadger no new logs entries are found, while the log file (in this case postgresql-Wed.log) is increasing. I've already checkd that the ssh luca@miguel "cat '/var/log/postgresql/postgresql-Wed.log'" command is working and is outputting all the entries in the log file, so it seems there is no problem in retrieving the log content. The LAST_PARSED file is fine and contains:

$ cat /var/www/html/pgbadger/miguel/LAST_PARSED
2022-03-23 17:26:43     252798162       2022-03-23 17:26:43 ...

If I rsync locally the file then pgbadger builds incrementally on the very same file, while via SSH keeps stating that the file has been already parsed.

fluca1978 commented 2 years ago

After digging, I found that the problem seems to be in the check_file_changed routine at line 3786: it seems the routine inspects the first line of the file, that clearly has a date less than the last saved one, and so it exits, reporting no change at all. Apparently, this seems to be a problem related to ssh URI, but it seems to work the following not-URI invocation:

$ pgbadger --ssh-user luca -r miguel  /var/log/postgresql/* -I -O /var/www/html/pgbadger/miguel/ -X --start-monday -v

It could be that piping ssh into cat has some differences. What I see, from the debug output, is that when using the ssh URI the command used to retrieve the log is:

DEBUG: Retrieving log entries using command: ssh luca@miguel "cat '/var/log/postgresql/postgresql-Thu.log'" |

while when using -r is

DEBUG: Retrieving log entries using command: ssh -o ConnectTimeout=10 -o PreferredAuthentications=hostbased,publickey luca@miguel " cat  '/var/log/postgresql/postgresql-Thu.log''' |

The only difference, to me, are the connection options that should not cause the problem.

fluca1978 commented 2 years ago

Apparently the key-changing fact is the -r option, since with such specified it seems to work even if an ssh URI is specified:

$ pgbadger  -r miguel  'ssh://luca@miguel//var/log/postgresql/*' -I -O /var/www/html/pgbadger/miguel/ -X --start-monday -v
darold commented 2 years ago

Hi Luca,

Thanks for the report, commit 093e233 adds the ssh options to the command. Please let us know if that's solve your issue.

fluca1978 commented 2 years ago

Uhm, it seems not working to me without the -r option. I see that the ssh options are now there in the URI only invocation, but the file is not detected as "new", while it is:

% git log -n 3
commit 3b03a6cc529ae8cc9c74c8bf007571f252a2eee0
commit 0ac9817f406361fd2900abe92fdbc9f9f731b470
commit 093e2339513a0c0239af3877cde74993f6c3c8ef

% ./pgbadger 'ssh://luca@miguel//var/log/postgresql/*.log'   -I -O ~/tmp -v 
DEBUG: Looking for remote filename using command: ssh -o ConnectTimeout=10 -o PreferredAuthentications=hostbased,publickey luca@miguel "ls '/var/log/postgresql'*.log"
DEBUG: pgBadger version 11.7.
DEBUG: Output 'html' reports will be written to out.html
DEBUG: Found log offset 227658987 in file /home/luca/tmp/LAST_PARSED
DEBUG: Looking for file size using command: ssh -o ConnectTimeout=10 -o PreferredAuthentications=hostbased,publickey luca@miguel "ls -l '/var/log/postgresql/postgresql.log'" | awk '{print $5}'
DEBUG: Remote file size: 182512807
DEBUG: Retrieving log entries using command: ssh -o ConnectTimeout=10 -o PreferredAuthentications=hostbased,publickey luca@miguel "cat '/var/log/postgresql/postgresql.log'" |
DEBUG: Autodetected log format 'default' from ssh://luca@miguel//var/log/postgresql/postgresql.log
DEBUG: Retrieving log entries using command: ssh -o ConnectTimeout=10 -o PreferredAuthentications=hostbased,publickey luca@miguel "cat '/var/log/postgresql/postgresql.log'" |
DEBUG: this file has already been parsed: ssh://luca@miguel//var/log/postgresql/postgresql.log, there are no new lines in this file
DEBUG: Reading temporary last parsed line from /tmp/tmp_LAST_PARSED
DEBUG: the log statistics gathering took: 1 wallclock secs ( 0.01 usr  0.01 sys +  0.05 cusr  0.02 csys =  0.09 CPU)
DEBUG: no new entries in your log(s) since last run.
DEBUG: building reports took:  0 wallclock secs ( 0.00 usr +  0.00 sys =  0.00 CPU)
DEBUG: the total execution time took:  1 wallclock secs ( 0.01 usr  0.01 sys +  0.05 cusr  0.02 csys =  0.09 CPU)

while if, immediatly after, I do the same with -r

% ./pgbadger 'ssh://luca@miguel//var/log/postgresql//*.log'   -I -O ~/tmp -v -r miguel
DEBUG: Looking for remote filename using command: ssh -o ConnectTimeout=10 -o PreferredAuthentications=hostbased,publickey luca@miguel "ls '/var/log/postgresql//'*.log"
DEBUG: pgBadger version 11.7.
DEBUG: Output 'html' reports will be written to out.html
DEBUG: Found log offset 227658987 in file /home/luca/tmp/LAST_PARSED
DEBUG: Looking for file size using command: ssh -o ConnectTimeout=10 -o PreferredAuthentications=hostbased,publickey luca@miguel "ls -l '/var/log/postgresql/postgresql.log'" | awk '{print $5}'
DEBUG: Remote file size: 182512807
DEBUG: Starting progressbar writer process
DEBUG: Retrieving log sample using command: ssh -o ConnectTimeout=10 -o PreferredAuthentications=hostbased,publickey luca@miguel "tail -n 100 '/var/log/postgresql/postgresql.log'" |
DEBUG: Autodetected log format 'default' from ssh://luca@miguel//var/log/postgresql/postgresql.log
DEBUG: pgBadger will use log format default to parse ssh://luca@miguel//var/log/postgresql/postgresql.log.
DEBUG: timezone not specified, using -3600 seconds
DEBUG: Processing remote log file: miguel:ssh://luca@miguel//var/log/postgresql/postgresql.log
DEBUG: Retrieving log entries using command: ssh -o ConnectTimeout=10 -o PreferredAuthentications=hostbased,publickey luca@miguel "cat '/var/log/postgresql/postgresql.log'" |
DEBUG: Starting offset 227658987 is greater than total size 182512807 for file ssh://luca@miguel//var/log/postgresql/postgresql.log
DEBUG: Reverting start offset 227658987 to 0 for file ssh://luca@miguel//var/log/postgresql/postgresql.log, stoppping offset is 182512807
DEBUG: Starting reading file "miguel:ssh://luca@miguel//var/log/postgresql/postgresql.log"...
DEBUG: Start parsing postgresql log at offset 0 of file "ssh://luca@miguel//var/log/postgresql/postgresql.log" to 182512807

DEBUG: Unknown stderr log line format:  ORDER BY 1,2
DEBUG: setting current position in log to 182512807of 182512807 (99.60%), queries: 422990, events: 3s: 72992, events: 2
DEBUG: Saving last parsed line into /tmp/tmp_LAST_PARSED (2022-03-25 19:25:45eri182512807), events: 3

DEBUG: Reading temporary last parsed line from /tmp/tmp_LAST_PARSED
DEBUG: Saving last parsed line into /home/luca/tmp/LAST_PARSED
DEBUG: the log statistics gathering took:79 wallclock secs ( 0.00 usr  0.01 sys + 78.64 cusr  2.78 csys = 81.43 CPU)
DEBUG: Building incremental report for 2022-03-25
LOG: Ok, generating HTML daily report into /home/luca/tmp/2022/03/25/...
LOG: Ok, generating HTML weekly report into /home/luca/tmp/2022/week-13/...
LOG: Ok, generating global index to access incremental reports...
DEBUG: writing global index into /home/luca/tmp/index.html
DEBUG: building reports took:  5 wallclock secs ( 4.67 usr +  0.02 sys =  4.69 CPU)
DEBUG: the total execution time took: 84 wallclock secs ( 4.67 usr  0.03 sys + 78.64 cusr  2.78 csys = 86.12 CPU)

so the r option still triggers something different in the behavior.

darold commented 2 years ago

Ok, I was able to reproduce, the problem was that I was testing change with a new log file not with new entries in the last log. Commit 6a4750b fixes this issue.

fluca1978 commented 2 years ago

Great job! It works for me now.

darold commented 2 years ago

@fluca1978 please give a try to latest commit 44f0be5. I have done a full review of the log file detection so please confirm that is not break your reports. It should not impact remote log file but just to be sure.

fluca1978 commented 2 years ago

It is working for me. I've tested against v11.8 at 996c199f50932ce25c05926c37e1468cdad9d6a5.

I see messages in the debug output that state what is happening, I don't remember having seen such messages before:

DEBUG: This file should be parsed from the beginning: ssh://luca@venkman//var/postgresql//log/postgresql.log, the file is new, the entire file must be parsed
DEBUG: Reverting start offset 9002266 to 0 for file ssh://luca@venkman//var/postgresql//log/postgresql.log, stopping offset is 15627864