darold / pgbadger

A fast PostgreSQL Log Analyzer
http://pgbadger.darold.net/
PostgreSQL License
3.55k stars 352 forks source link

process hangs indefinitely at the end of the parsing #643

Closed fpietka closed 3 years ago

fpietka commented 3 years ago

Hi, I have trouble parsing big (7-8G+) log files.

pgbadger: 11.5 postgres: 12.6 (hosted on Heroku)

command line used:

pgbadger -j 8 -f logplex --dbname=mydbname --exclude-query="^(COPY|COMMIT|BEGIN|SELECT 1|pg_start_backup|pg_stop_backup|pg_catalog|pg_stat_statements|pg_catalog.pg_stat_archiver)" -o $date.html $date.log

I'm consolidating logs for a whole day to be parser on an EC2. I am using a machine with 8 vcpu and 32gb of ram.

First issue: 16gb was not enough, and since I don't use swap, pgbadger was hanging before completion, and drop of parsing generating an empty html report. Since it has a success exit code, it wasn't possible to notice until we opened the html.

Second issue: it parses the whole 7-8G file in like 6 minutes (sometimes multiprocess drops from 8 to 5 in the middle of the parsing), then hangs at 99.99% for more than 3 hours with a vcpu at 100%.

At first it ran fine once, but since then I can't seem to be able to have a successfully built report.

Maybe I'm doing something wrong. If you need any more information, please tell me.

darold commented 3 years ago

Can you post one or two lines of your log to check if this correspond to the logplex format defined in pgbadger.

fpietka commented 3 years ago
2021-03-08T20:00:26+00:00 app[postgres.31309]: [DATABASE] [49951-1]  sql_error_code = 00000 LOG:  duration: 0.086 ms  parse <unnamed>: SELECT "booking_operationals".* FROM "booking_operationals" WHERE "booking_operationals"."listing_id" = $1 ORDER BY "booking_operationals"."start_date" ASC 
2021-03-08T20:00:26+00:00 app[postgres.31309]: [DATABASE] [49952-1]  sql_error_code = 00000 LOG:  duration: 0.340 ms  bind <unnamed>: SELECT "booking_operationals".* FROM "booking_operationals" WHERE "booking_operationals"."listing_id" = $1 ORDER BY "booking_operationals"."start_date" ASC 
2021-03-08T20:00:26+00:00 app[postgres.31309]: [DATABASE] [49952-2]  sql_error_code = 00000 DETAIL:  parameters: $1 = '665'
2021-03-08T20:00:26+00:00 app[postgres.31309]: [DATABASE] [49953-1]  sql_error_code = 00000 LOG:  duration: 0.187 ms  execute <unnamed>: SELECT "booking_operationals".* FROM "booking_operationals" WHERE "booking_operationals"."listing_id" = $1 ORDER BY "booking_operationals"."start_date" ASC 
2021-03-08T20:00:26+00:00 app[postgres.31309]: [DATABASE] [49953-2]  sql_error_code = 00000 DETAIL:  parameters: $1 = '665'
2021-03-08T20:00:26+00:00 app[postgres.31309]: [DATABASE] [49954-1]  sql_error_code = 00000 LOG:  duration: 0.071 ms  parse <unnamed>: SELECT "booking_operationals".* FROM "booking_operationals" WHERE "booking_operationals"."listing_id" = $1 ORDER BY "booking_operationals"."start_date" ASC 
2021-03-08T20:00:26+00:00 app[postgres.31309]: [DATABASE] [49955-1]  sql_error_code = 00000 LOG:  duration: 0.293 ms  bind <unnamed>: SELECT "booking_operationals".* FROM "booking_operationals" WHERE "booking_operationals"."listing_id" = $1 ORDER BY "booking_operationals"."start_date" ASC 
2021-03-08T20:00:26+00:00 app[postgres.31309]: [DATABASE] [49955-2]  sql_error_code = 00000 DETAIL:  parameters: $1 = '666'

I just removed comments at the end of the queries I have pointing to code lines that we use to track down the culprit.

Full disclosure: I'm reformating the line from Heroku a bit as I had no luck with the raw lines.

fpietka commented 3 years ago

Also I ran somes tests and I did succeed (and I'm still succeeding) at parsing the file if I split it (around 3-4Go)

darold commented 3 years ago

The log format is ok, can you try by using -m 1024 to see if this is a problem of very long queries?

fpietka commented 3 years ago

I figured it out (kinda)! You put me in the right direction regarding log format. It turns out my logs are not sorted, because I process them before sending them to pgbadger, and didn't care to. Turns out I had changed my Heroku log drain, and since I removed the drain name from the log, the older drain was not cleaned up, and so had a wrong format.

I saw this when I ran in verbose mode...

My question is: why doesn't pgbadger just discard them? is it an issue with the amount of badly formatted logs? Since then I have no more issue regarding parsing the logs. So besides my question (out of curiosity), you can consider this issue closed.

Thanks for your time.

darold commented 3 years ago

It doesn't discard them because the log format match the searched patterns, only lines that do not match are discarded.