darold / pgbadger

A fast PostgreSQL Log Analyzer
http://pgbadger.darold.net/
PostgreSQL License
3.45k stars 348 forks source link

pgbadger being killed by OOM #794

Closed ajtwatching closed 9 months ago

ajtwatching commented 10 months ago

Hi,

Running pgbadger (v12.0) daily to process posgresql logs from a bunch of servers. It seems weekly that the process starts sucking up a bunch of memory and is then subsequently killed by the OOM killer. I then manually re-run and most of the time it then runs through without issue (until next week).

Seems it's always the Friday/Sunday runs...

[Fri Sep  8 04:36:22 2023] Out of memory: Killed process 1228260 (pgbadger) total-vm:17543696kB, anon-rss:15700548kB, file-rss:0kB, shmem-rss:0kB, UID:112 pgtables:34368kB oom_score_adj:0
[Sun Sep 10 18:38:41 2023] Out of memory: Killed process 1347264 (pgbadger) total-vm:17516908kB, anon-rss:15673628kB, file-rss:0kB, shmem-rss:0kB, UID:112 pgtables:34308kB oom_score_adj:0
[Fri Sep 15 04:34:19 2023] Out of memory: Killed process 1570350 (pgbadger) total-vm:17530968kB, anon-rss:15688556kB, file-rss:0kB, shmem-rss:0kB, UID:112 pgtables:34340kB oom_score_adj:0
[Sun Sep 17 17:56:23 2023] Out of memory: Killed process 1688063 (pgbadger) total-vm:17515552kB, anon-rss:15671564kB, file-rss:0kB, shmem-rss:0kB, UID:112 pgtables:34312kB oom_score_adj:0

Versions:

$ pgbadger --version
pgBadger version 12.0

This is perl 5, version 30, subversion 0 (v5.30.0) built for x86_64-linux-gnu-thread-multi

pgbadger command is always the same, just sequentially cycle through a bunch of hosts.

INFO -- : Running pgbadger /usr/bin/pgbadger -j4 --prefix '%t [%p-%l] %h %a %q%u@%d ' --exclude-appname 'pg_dump' --retention 24 -I -q prddb30/postgresql*log.gz -O /var/www/pgbadger/reports/prddb30

VM has 16 cores and 16GB of memory. I was running with 12 threads, however dropped it back to 4 in case that was the issue (but still happening with 4).

node_exporter graph of the memory increasing to the limit. Most of the time the box is lucky to use 3GB of memory.

Screen Shot 2023-09-11 at 9 49 55 am (1)

Any additional debugging I can perform?

Thanks.

darold commented 9 months ago

pgbadger can use lot of memory depending of the log size and queries length, you should tune your kernel to allow perl to use more memory (vm.overcommit_ratio), you can also disable the OOM killer by setting vm.overcommit_memory to 2. If this is not enough you might want to increase the memory size.

ajtwatching commented 9 months ago

Thanks for the input. I've bumped the memory on the VM to see how it goes. I was suspicious though as my daily log data is pretty consistent.

postgresql-20230908: 1012M      total
postgresql-20230909: 144M       total
postgresql-20230910: 304M       total
postgresql-20230911: 1.1G       total
postgresql-20230912: 1.1G       total
postgresql-20230913: 1.2G       total
postgresql-20230914: 1.1G       total
postgresql-20230915: 959M       total
postgresql-20230916: 163M       total
postgresql-20230917: 255M       total
postgresql-20230918: 924M       total
postgresql-20230919: 805M       total
postgresql-20230920: 1.1G       total
postgresql-20230921: 1.4G       total
postgresql-20230922: 1.3G       total
postgresql-20230924: 274M       total
postgresql-20230925: 1.4G       total

I'll keep eyes on it towards the end of this week.

Last night's run peaked around 9GB memory.

Screenshot 2023-09-27 at 11 28 16 am
ajtwatching commented 9 months ago

Yeah, so it consumed all the additional memory again.

I, [2023-09-29T04:15:01.515640 #2265845]  INFO -- : Running pgbadger /usr/bin/pgbadger -j4 --prefix '%t [%p-%l] %h %a %q%u@%d '  --exclude-appname 'pg_dump' --retention 24 -I -q prddb21/postgresql*log.gz -O /var/www/pgbadger/reports/prddb21
E, [2023-09-29T04:40:46.241583 #2265845] ERROR -- : Error prddb21 running pgbadger

Which correlates with the activity on the box with the memory spiking to exhaustion around 04h40.

Screenshot 2023-10-04 at 9 10 28 am

I checked the log size/volume again, and they remain consistent across the days.

20230925: 1.4G  total
20230926: 1.2G  total
20230927: 1.3G  total
20230928: 1.3G  total
20230929: 1.2G  total

I'll give it some more memory!

ajtwatching commented 9 months ago

Having bumped the vm to 32GB, seems to be running without issue. Although still a little strange to me given the log size is consistent, however it seems just once a week that it gets really memory hungry... anyways will close this out. Thanks for the input.

Screenshot 2023-10-11 at 2 04 10 pm