Icinga / icinga2

The core of our monitoring platform with a powerful configuration language and REST API.
https://icinga.com/docs/icinga2/latest
GNU General Public License v2.0
2.01k stars 576 forks source link

Icinga2 2.13.6 memory leak #9621

Closed deric closed 1 year ago

deric commented 1 year ago

Describe the bug

Main process stops responding, UI claims that the monitoring backend is down.

icinga2 has status:

Status: "Shutting down old instance..."
systemd[1]: Started Icinga host/service/network monitoring system.
systemd[1]: Stopping Icinga host/service/network monitoring system...
systemd[1]: icinga2.service: State 'stop-sigterm' timed out. Killing.
systemd[1]: icinga2.service: Killing process 325710 (icinga2) with signal SIGKILL.
systemd[1]: icinga2.service: Killing process 325752 (icinga2) with signal SIGKILL.
systemd[1]: icinga2.service: Killing process 325761 (icinga2) with signal SIGKILL.
systemd[1]: icinga2.service: Killing process 325753 (icinga2) with signal SIGKILL.
systemd[1]: icinga2.service: Killing process 325754 (icinga2) with signal SIGKILL.
systemd[1]: icinga2.service: Killing process 325755 (icinga2) with signal SIGKILL.
systemd[1]: icinga2.service: Killing process 325756 (icinga2) with signal SIGKILL.
systemd[1]: icinga2.service: Killing process 325758 (icinga2) with signal SIGKILL.
systemd[1]: icinga2.service: Killing process 325759 (icinga2) with signal SIGKILL.
systemd[1]: icinga2.service: Killing process 325760 (icinga2) with signal SIGKILL.
systemd[1]: icinga2.service: Killing process 325762 (icinga2) with signal SIGKILL.
systemd[1]: icinga2.service: Killing process 325773 (icinga2) with signal SIGKILL.
systemd[1]: icinga2.service: Killing process 325781 (icinga2) with signal SIGKILL.
systemd[1]: icinga2.service: Killing process 325782 (n/a) with signal SIGKILL.
systemd[1]: icinga2.service: Killing process 325783 (icinga2) with signal SIGKILL.
systemd[1]: icinga2.service: Killing process 325784 (icinga2) with signal SIGKILL.
systemd[1]: icinga2.service: Killing process 325785 (icinga2) with signal SIGKILL.
systemd[1]: icinga2.service: Killing process 325786 (icinga2) with signal SIGKILL.
systemd[1]: icinga2.service: Killing process 325787 (icinga2) with signal SIGKILL.
systemd[1]: icinga2.service: Killing process 325788 (icinga2) with signal SIGKILL.
systemd[1]: icinga2.service: Killing process 325789 (n/a) with signal SIGKILL.
systemd[1]: icinga2.service: Killing process 325790 (icinga2) with signal SIGKILL.
systemd[1]: icinga2.service: Killing process 325792 (icinga2) with signal SIGKILL.
systemd[1]: icinga2.service: Killing process 325794 (n/a) with signal SIGKILL.
systemd[1]: icinga2.service: Killing process 325795 (icinga2) with signal SIGKILL.
systemd[1]: icinga2.service: Killing process 325796 (icinga2) with signal SIGKILL.
systemd[1]: icinga2.service: Killing process 325797 (icinga2) with signal SIGKILL.
systemd[1]: icinga2.service: Killing process 325798 (icinga2) with signal SIGKILL.
systemd[1]: icinga2.service: Main process exited, code=killed, status=9/KILL
systemd[1]: icinga2.service: Killing process 325761 (icinga2) with signal SIGKILL.
systemd[1]: icinga2.service: Killing process 325752 (icinga2) with signal SIGKILL.
systemd[1]: icinga2.service: Killing process 325781 (icinga2) with signal SIGKILL.
systemd[1]: icinga2.service: Killing process 325783 (icinga2) with signal SIGKILL.
systemd[1]: icinga2.service: Killing process 325785 (icinga2) with signal SIGKILL.
systemd[1]: icinga2.service: Killing process 325786 (icinga2) with signal SIGKILL.
systemd[1]: icinga2.service: Killing process 325787 (icinga2) with signal SIGKILL.
systemd[1]: icinga2.service: Killing process 325788 (icinga2) with signal SIGKILL.
systemd[1]: icinga2.service: Killing process 325798 (icinga2) with signal SIGKILL.
systemd[1]: icinga2.service: Failed with result 'timeout'.
systemd[1]: Stopped Icinga host/service/network monitoring system.
systemd[1]: icinga2.service: Consumed 8min 26.812s CPU time.

AFAIK default icinga2.service config:

[Service]
Type=notify
NotifyAccess=all
Environment="ICINGA2_ERROR_LOG=/var/log/icinga2/error.log"
EnvironmentFile=/etc/default/icinga2
ExecStartPre=/usr/lib/icinga2/prepare-dirs /etc/default/icinga2
ExecStart=/usr/sbin/icinga2 daemon --close-stdio -e ${ICINGA2_ERROR_LOG}
PIDFile=/run/icinga2/icinga2.pid
ExecReload=/usr/lib/icinga2/safe-reload /etc/default/icinga2
TimeoutStartSec=30m
KillMode=mixed

To Reproduce

Not sure. The crash is probably triggered by configuration changes made by Puppet. Though the config reload should be safe (configuration is valid).

/etc/systemd/system/icinga2.service.d/limits.conf:

[Service]
TasksMax=infinity

Currently the master process requires about 3.4GB of RAM, secondary master is on 2.5GB.

Your Environment

Include as many relevant details about the environment you experienced the problem in

julianbrost commented 1 year ago

Looks more like something is trying to stop icinga2 but it isn't stopping in time so it is killed. Can you please also share timestamps for the systemd log messages as well as icinga2 logs (/var/log/icinga2/icinga2.log)?

deric commented 1 year ago

@julianbrost Yes, the reload is triggered by puppet, as I wrote above. But it timeouts and the main process is unresponsive during the grace period.


Jan 03 10:24:34 safe-reload[317362]: Reloading Icinga 2: Done
Jan 03 10:25:18 systemd[1]: Reloaded Icinga host/service/network monitoring system.
Jan 03 13:24:10 puppet-agent[323515]: (/Stage[main]/Icinga::Master/Icinga2::Object::Endpoint...
Jan 03 13:24:11 systemd[1]: Reloading Icinga host/service/network monitoring system.
Jan 03 13:24:38 safe-reload[323975]: Validating config files: Done
Jan 03 13:24:38 safe-reload[323975]: Reloading Icinga 2: Done
Jan 03 13:31:30 systemd[1]: icinga2.service: State 'stop-sigterm' timed out. Killing.
Jan 03 13:31:30 systemd[1]: icinga2.service: Killing process 312434 (icinga2) with signal SIGKILL.
Jan 03 13:31:30 systemd[1]: icinga2.service: Killing process 317424 (icinga2) with signal SIGKILL.
Jan 03 13:31:30 systemd[1]: icinga2.service: Killing process 317433 (icinga2) with signal SIGKILL.
Jan 03 13:31:30 systemd[1]: icinga2.service: Killing process 324036 (icinga2) with signal SIGKILL.
Jan 03 13:31:30 systemd[1]: icinga2.service: Killing process 324045 (icinga2) with signal SIGKILL.
Jan 03 13:31:30 systemd[1]: icinga2.service: Killing process 317427 (icinga2) with signal SIGKILL.
Jan 03 13:31:30 systemd[1]: icinga2.service: Killing process 317428 (icinga2) with signal SIGKILL.
Jan 03 13:31:30 systemd[1]: icinga2.service: Killing process 317429 (icinga2) with signal SIGKILL.
Jan 03 13:31:30 systemd[1]: icinga2.service: Killing process 317430 (icinga2) with signal SIGKILL.
Jan 03 13:31:30 systemd[1]: icinga2.service: Killing process 317431 (icinga2) with signal SIGKILL.
Jan 03 13:31:30 systemd[1]: icinga2.service: Killing process 317463 (icinga2) with signal SIGKILL.
Jan 03 13:31:30 systemd[1]: icinga2.service: Killing process 317464 (icinga2) with signal SIGKILL.
Jan 03 13:31:30 systemd[1]: icinga2.service: Killing process 317465 (icinga2) with signal SIGKILL.
Jan 03 13:31:30 systemd[1]: icinga2.service: Killing process 317466 (icinga2) with signal SIGKILL.
Jan 03 13:31:30 systemd[1]: icinga2.service: Killing process 317467 (n/a) with signal SIGKILL.
Jan 03 13:31:30 systemd[1]: icinga2.service: Killing process 317469 (n/a) with signal SIGKILL.
Jan 03 13:31:30 systemd[1]: icinga2.service: Killing process 317470 (icinga2) with signal SIGKILL.
Jan 03 13:31:30 systemd[1]: icinga2.service: Killing process 317471 (icinga2) with signal SIGKILL.
Jan 03 13:31:30 systemd[1]: icinga2.service: Killing process 317475 (icinga2) with signal SIGKILL.
Jan 03 13:31:30 systemd[1]: icinga2.service: Killing process 317478 (icinga2) with signal SIGKILL.
Jan 03 13:31:30 systemd[1]: icinga2.service: Killing process 324037 (icinga2) with signal SIGKILL.
Jan 03 13:31:30 systemd[1]: icinga2.service: Killing process 324038 (icinga2) with signal SIGKILL.
Jan 03 13:31:30 systemd[1]: icinga2.service: Killing process 324039 (icinga2) with signal SIGKILL.
Jan 03 13:31:30 systemd[1]: icinga2.service: Killing process 324041 (icinga2) with signal SIGKILL.
Jan 03 13:31:30 systemd[1]: icinga2.service: Killing process 324042 (icinga2) with signal SIGKILL.
Jan 03 13:31:30 systemd[1]: icinga2.service: Killing process 324046 (icinga2) with signal SIGKILL.
Jan 03 13:31:30 systemd[1]: icinga2.service: Killing process 324059 (icinga2) with signal SIGKILL.
Jan 03 13:31:30 systemd[1]: icinga2.service: Killing process 324061 (icinga2) with signal SIGKILL.
Jan 03 13:31:30 systemd[1]: icinga2.service: Killing process 324062 (icinga2) with signal SIGKILL.
Jan 03 13:31:30 systemd[1]: icinga2.service: Killing process 324064 (icinga2) with signal SIGKILL.
Jan 03 13:31:30 systemd[1]: icinga2.service: Killing process 324066 (icinga2) with signal SIGKILL.
Jan 03 13:31:30 systemd[1]: icinga2.service: Main process exited, code=killed, status=9/KILL
Jan 03 13:31:30 systemd[1]: icinga2.service: Killing process 317433 (icinga2) with signal SIGKILL.
Jan 03 13:31:30 systemd[1]: icinga2.service: Killing process 324036 (icinga2) with signal SIGKILL.
Jan 03 13:31:30 systemd[1]: icinga2.service: Killing process 317424 (icinga2) with signal SIGKILL.
Jan 03 13:31:30 systemd[1]: icinga2.service: Killing process 317465 (icinga2) with signal SIGKILL.
Jan 03 13:31:30 systemd[1]: icinga2.service: Killing process 317466 (icinga2) with signal SIGKILL.
Jan 03 13:31:30 systemd[1]: icinga2.service: Killing process 324039 (icinga2) with signal SIGKILL.
Jan 03 13:31:30 systemd[1]: icinga2.service: Killing process 324042 (icinga2) with signal SIGKILL.
Jan 03 13:31:30 systemd[1]: icinga2.service: Killing process 324046 (icinga2) with signal SIGKILL.
Jan 03 13:31:30 systemd[1]: icinga2.service: Failed with result 'timeout'.
Jan 03 13:31:30 systemd[1]: Stopped Icinga host/service/network monitoring system.
Jan 03 13:31:30 systemd[1]: icinga2.service: Consumed 7h 56min 11.767s CPU time.
Jan 03 13:31:30 systemd[1]: Starting Icinga host/service/network monitoring system...
Jan 03 13:31:30 icinga2[324112]: [2023-01-03 13:31:30 +0000] information/cli: Icinga application loader (version: r2.13.6-1)
Jan 03 13:31:49 puppet-agent[323515]: (/Stage[main]/Icinga2::Service/Service[icinga2]) Triggered 'refresh' from 1 event
Jan 03 13:31:50 puppet-agent[323515]: Applied catalog in 518.05 seconds
julianbrost commented 1 year ago

Is there anything logged to /var/log/icinga2/icinga2.log in that time?

It would also be helpful to have backtraces from the processes at the time it should reload (so in the above logs, this would have been somewhere around 13:30). Please have a look at https://icinga.com/docs/icinga-2/latest/doc/21-development/#gdb-backtrace-from-running-process for how to obtain them.

deric commented 1 year ago

No, not really. The reload/restart is irrelevant, it looks like a memory leak to me:

Screenshot_20230104_090603

The node has 8 GB of RAM, it used to work fine with ~2.4GB RAM.

The instance has been upgraded 2.13.5-1.bullseye -> 2.13.6-2+debian11on 2023-01-02. Although started leaking memory 24h after the upgrade... so not necessarily related to the upgrade.

Start-Date: 2023-01-02  10:21:25
Commandline: apt upgrade -y
Upgrade: icinga2-ido-pgsql:amd64 (2.13.5-1.bullseye, 2.13.6-2+debian11), icingaweb2-common:amd64 (2.11.1-1.bullseye, 2.11.3-2.bullseye), icinga-php-thirdparty:amd64 (0.11.0-1.bullseye, 0.11.0-2.bullseye), php-icinga:amd64 (2.11.1-1.bullseye, 2.11.3-2.bullseye), icinga2-bin:amd64 (2.13.5-1.bullseye, 2.13.6-2+debian11), postgresql-client-13:amd64 (13.9-0+deb11u1, 13.9-1.pgdg110+1), icinga2:amd64 (2.13.5-1.bullseye, 2.13.6-2+debian11), icingaweb2:amd64 (2.11.1-1.bullseye, 2.11.3-2.bullseye), icinga-php-library:amd64 (0.10.0-1+debian11, 0.10.1-1+debian11), icingaweb2-module-monitoring:amd64 (2.11.1-1.bullseye, 2.11.3-2.bullseye), postgresql-client-common:amd64 (244.pgdg110+1, 246.pgdg110+1), dbconfig-common:amd64 (2.0.19, 2.0.22~pgdg110+1), telegraf:amd64 (1.24.2-1, 1.25.0-1), icinga2-common:amd64 (2.13.5-1.bullseye, 2.13.6-2+debian11), libpq5:amd64 (15.0-1.pgdg110+1, 15.1-1.pgdg110+1), puppet-agent:amd64 (7.20.0-1bullseye, 7.21.0-1bullseye)
End-Date: 2023-01-02  10:22:04
deric commented 1 year ago

@julianbrost Here are the backtraces, let me know if you need anything else:

Currently the icinga2 service is at 4.1GB

deric commented 1 year ago

from heaptrack profiler: Screenshot_20230104_172254

Screenshot_20230104_172441

deric commented 1 year ago

A flame graph of leaked memory:

Screenshot_20230105_084953

julianbrost commented 1 year ago

This looks like your database server isn't able to keep up with the amount of queries generated by Icinga. These queries will then be buffered. If this happens for a short time, they will be processed afterwards and the queue shrinks again. If the database server can't keep up with the sustained rate of queries, this memory usage will grow. There should also be hints to this in the logs, please do check them.

deric commented 1 year ago

@julianbrost That's weird. The allocations are linearly growing and the db backend is pretty much idle. The DB might need some tuning, but that's not the main issue.

Screenshot_20230105_094743

A workaround is to downgrade the master instances to the previous version 2.13.5:

apt install icinga2=2.13.5-1.bullseye icinga2-bin=2.13.5-1.bullseye icinga2-common=2.13.5-1.bullseye icinga2-ido-pgsql=2.13.5-1.bullseye

With 2.13.5 the memory usage is constant and I haven't done any change to the db backend:

Screenshot_20230105_095258

deric commented 1 year ago

Here's pending queries log. With 2.13.6 the queries backlog is only growing, after downgrade it gets better (the throughput is basically constant all the time)

[2023-01-05 06:50:11 +0000] information/IdoPgsqlConnection: Pending queries: 588928 (Input: 1714/s; Output: 1342/s)
[2023-01-05 06:50:21 +0000] information/IdoPgsqlConnection: Pending queries: 592830 (Input: 1775/s; Output: 1386/s)
[2023-01-05 06:50:31 +0000] information/IdoPgsqlConnection: Pending queries: 595250 (Input: 1767/s; Output: 1539/s)
[2023-01-05 06:50:41 +0000] information/IdoPgsqlConnection: Pending queries: 600774 (Input: 1738/s; Output: 1195/s)
[2023-01-05 06:50:51 +0000] information/IdoPgsqlConnection: Pending queries: 604802 (Input: 1683/s; Output: 1312/s)
[2023-01-05 06:51:01 +0000] information/IdoPgsqlConnection: Pending queries: 610195 (Input: 1771/s; Output: 1254/s)
[2023-01-05 06:51:11 +0000] information/IdoPgsqlConnection: Pending queries: 616477 (Input: 1787/s; Output: 1181/s)
[2023-01-05 06:51:21 +0000] information/IdoPgsqlConnection: Pending queries: 620253 (Input: 1718/s; Output: 1348/s)
[2023-01-05 06:51:31 +0000] information/IdoPgsqlConnection: Pending queries: 625293 (Input: 1758/s; Output: 1266/s)
[2023-01-05 06:51:41 +0000] information/IdoPgsqlConnection: Pending queries: 632946 (Input: 1784/s; Output: 1036/s)
[2023-01-05 06:51:51 +0000] information/IdoPgsqlConnection: Pending queries: 638382 (Input: 1739/s; Output: 1206/s)
[2023-01-05 06:52:01 +0000] information/IdoPgsqlConnection: Pending queries: 640290 (Input: 1739/s; Output: 1556/s)
[2023-01-05 06:52:11 +0000] information/IdoPgsqlConnection: Pending queries: 640998 (Input: 1640/s; Output: 1577/s)
---- icinga2 downgrade to 2.13.5
[2023-01-05 06:52:21 +0000] information/IdoPgsqlConnection: Pending queries: 641006 (Input: 1/s; Output: 1577/s)
[2023-01-05 06:52:31 +0000] information/IdoPgsqlConnection: Pending queries: 641054 (Input: 5/s; Output: 1577/s)
[2023-01-05 06:52:41 +0000] information/IdoPgsqlConnection: Pending queries: 641074 (Input: 2/s; Output: 1577/s)
[2023-01-05 06:52:51 +0000] information/IdoPgsqlConnection: Pending queries: 641094 (Input: 2/s; Output: 1577/s)
[2023-01-05 06:53:01 +0000] information/IdoPgsqlConnection: Pending queries: 641389 (Input: 607/s; Output: 577/s)
[2023-01-05 06:53:11 +0000] information/IdoPgsqlConnection: Pending queries: 687177 (Input: 6316/s; Output: 1854/s)
[2023-01-05 06:53:21 +0000] information/IdoPgsqlConnection: Pending queries: 724263 (Input: 5403/s; Output: 1832/s)
[2023-01-05 06:53:31 +0000] information/IdoPgsqlConnection: Pending queries: 723213 (Input: 1754/s; Output: 1865/s)
[2023-01-05 06:53:41 +0000] information/IdoPgsqlConnection: Pending queries: 722565 (Input: 1782/s; Output: 1846/s)
[2023-01-05 06:53:51 +0000] information/IdoPgsqlConnection: Pending queries: 721805 (Input: 1746/s; Output: 1816/s)
[2023-01-05 06:54:01 +0000] information/IdoPgsqlConnection: Pending queries: 720947 (Input: 1736/s; Output: 1820/s)
[2023-01-05 06:54:11 +0000] information/IdoPgsqlConnection: Pending queries: 719505 (Input: 1702/s; Output: 1848/s)
[2023-01-05 06:54:21 +0000] information/IdoPgsqlConnection: Pending queries: 718963 (Input: 1765/s; Output: 1817/s)
[2023-01-05 06:54:31 +0000] information/IdoPgsqlConnection: Pending queries: 718263 (Input: 1769/s; Output: 1846/s)
[2023-01-05 06:54:41 +0000] information/IdoPgsqlConnection: Pending queries: 717381 (Input: 1734/s; Output: 1814/s)
[2023-01-05 06:54:51 +0000] information/IdoPgsqlConnection: Pending queries: 715831 (Input: 1681/s; Output: 1837/s)
[2023-01-05 06:55:01 +0000] information/IdoPgsqlConnection: Pending queries: 714835 (Input: 1781/s; Output: 1880/s)
[2023-01-05 06:55:11 +0000] information/IdoPgsqlConnection: Pending queries: 714408 (Input: 1790/s; Output: 1830/s)
[2023-01-05 06:55:21 +0000] information/IdoPgsqlConnection: Pending queries: 713572 (Input: 1716/s; Output: 1793/s)
[2023-01-05 06:55:31 +0000] information/IdoPgsqlConnection: Pending queries: 712734 (Input: 1752/s; Output: 1838/s)
[2023-01-05 06:55:41 +0000] information/IdoPgsqlConnection: Pending queries: 712144 (Input: 1780/s; Output: 1834/s)
[2023-01-05 06:55:51 +0000] information/IdoPgsqlConnection: Pending queries: 711250 (Input: 1753/s; Output: 1833/s)
[2023-01-05 06:56:01 +0000] information/IdoPgsqlConnection: Pending queries: 710282 (Input: 1730/s; Output: 1823/s)
[2023-01-05 06:56:11 +0000] information/IdoPgsqlConnection: Pending queries: 709484 (Input: 1709/s; Output: 1791/s)
[2023-01-05 06:56:21 +0000] information/IdoPgsqlConnection: Pending queries: 709502 (Input: 1769/s; Output: 1766/s)
[2023-01-05 06:56:31 +0000] information/IdoPgsqlConnection: Pending queries: 710262 (Input: 1763/s; Output: 1701/s)
[2023-01-05 06:56:41 +0000] information/IdoPgsqlConnection: Pending queries: 709901 (Input: 1731/s; Output: 1769/s)
[2023-01-05 06:56:51 +0000] information/IdoPgsqlConnection: Pending queries: 708856 (Input: 1682/s; Output: 1793/s)
[2023-01-05 06:57:01 +0000] information/IdoPgsqlConnection: Pending queries: 708387 (Input: 1775/s; Output: 1819/s)
[2023-01-05 06:57:11 +0000] information/IdoPgsqlConnection: Pending queries: 708012 (Input: 1785/s; Output: 1822/s)
[2023-01-05 06:57:21 +0000] information/IdoPgsqlConnection: Pending queries: 707121 (Input: 1717/s; Output: 1801/s)
[2023-01-05 06:57:31 +0000] information/IdoPgsqlConnection: Pending queries: 706326 (Input: 1754/s; Output: 1835/s)
[2023-01-05 06:57:41 +0000] information/IdoPgsqlConnection: Pending queries: 705924 (Input: 1782/s; Output: 1820/s)
[2023-01-05 06:57:51 +0000] information/IdoPgsqlConnection: Pending queries: 705482 (Input: 1737/s; Output: 1777/s)
[2023-01-05 06:58:01 +0000] information/IdoPgsqlConnection: Pending queries: 704864 (Input: 1734/s; Output: 1796/s)
[2023-01-05 06:58:11 +0000] information/IdoPgsqlConnection: Pending queries: 704010 (Input: 1707/s; Output: 1795/s)
[2023-01-05 06:58:21 +0000] information/IdoPgsqlConnection: Pending queries: 703222 (Input: 1762/s; Output: 1835/s)
[2023-01-05 06:58:31 +0000] information/IdoPgsqlConnection: Pending queries: 702655 (Input: 1766/s; Output: 1835/s)
[2023-01-05 06:58:41 +0000] information/IdoPgsqlConnection: Pending queries: 701523 (Input: 1746/s; Output: 1853/s)
[2023-01-05 06:58:51 +0000] information/IdoPgsqlConnection: Pending queries: 699702 (Input: 1691/s; Output: 1877/s)
...
[2023-01-05 07:55:01 +0000] information/IdoPgsqlConnection: Pending queries: 256663 (Input: 1769/s; Output: 1786/s)
[2023-01-05 07:55:11 +0000] information/IdoPgsqlConnection: Pending queries: 256487 (Input: 1782/s; Output: 1800/s)
[2023-01-05 07:55:21 +0000] information/IdoPgsqlConnection: Pending queries: 255353 (Input: 1714/s; Output: 1829/s)
[2023-01-05 07:55:31 +0000] information/IdoPgsqlConnection: Pending queries: 253883 (Input: 1754/s; Output: 1895/s)
[2023-01-05 07:55:41 +0000] information/IdoPgsqlConnection: Pending queries: 252695 (Input: 1781/s; Output: 1892/s)
[2023-01-05 07:55:51 +0000] information/IdoPgsqlConnection: Pending queries: 251337 (Input: 1749/s; Output: 1876/s)
[2023-01-05 07:56:01 +0000] information/IdoPgsqlConnection: Pending queries: 249877 (Input: 1732/s; Output: 1873/s)
deric commented 1 year ago

I did some optimization on the db backend side in order to improve queries throughput. It did helped.

However the possibilities are very limited since IdoPgsql is using only single connection. It appears that the latest version 2.13.6 is allocating more memory

Screenshot_20230111_084706

Sometimes the IdoPgsql gets stuck and it's unable to recover without service restart.

deric commented 1 year ago

I've lowered frequency of some checks and managed to keep the memory usage in reasonable limits. But the ido design isn't optimal, I'll try switching to IcingaDB.