OPMDG / check_pgactivity

Nagios remote agent
http://opm.readthedocs.io/probes/check_pgactivity.html
PostgreSQL License
175 stars 51 forks source link

Detect if postgres was started without huge pages when huge_page = try #296

Open blogh opened 3 years ago

blogh commented 3 years ago

Sometimes we cannot afford to not start if we don't have huge pages. \ It would be neat to have a service to detect when we couldn't start with the huge pages when huge_page = try.

rjuju commented 3 years ago

I'm a bit dubious about this one. Is there a way to know if postgres could map its shmem with huge pages at the sql level? If not, it means that it's a check that would only be possible is check_pgactivity is run locally which greatly limits the usefulness.

Krysztophe commented 3 years ago

I suppose this is not something visible from pg_shmem_allocations ?

Anyway: if we have to dig into /proc, that would not be the 1st service that needs to run on the DB server.

blogh commented 3 years ago

I found this : https://access.redhat.com/solutions/320303?sc_cid=cp \ It's not usable everywhere for instance RH 5 has no KernelPageSize.

It whould be neat to be able to fetch this info directly from postgres.

rjuju commented 3 years ago

I'm afraid that it's not that reliable. It's probably ok for a postgres dedicated server, but as soon as you have other services (including other postgres instances), you can't use those numbers anymore.

I agree that having a way to have an SQL API for that in postgres would be nice. There was a thread about raising the log level for the messages about failed attempt to use huge pages a few days ago and I already tried to add this point to the discussion.

blogh commented 3 years ago

great ! thanks

Krysztophe commented 1 year ago

In PG15, we now have shared_memory_size and shared_memory_size_in_huge_pages.

Assuming a single instance on a dedicated server, the service could check that most SB are huge pages. (ie checking that shared_memory_size_in_huge_pages * huge_page_size = shared_buffers, with a 10% tolerance just in case?)

With Huge pages, I always fear that PG does not find the HP for whatever reason, and use only 4k pages, wasting memory. Having such an alert would be less radical than huge_pages = on and a failure to start.

Problem: PG does not return the HP size value, huge_page_size is 0 by default. It could be a parameter of the service, it could be autodetected in /proc/meminfo, or the service could simply fail if huge_page_size is not defined.

frost242 commented 1 year ago

I'm not confident that shared_memory_size_in_huge_pages could be used to determine if PostgreSQL was started with all shared memory in huge pages. This GUC is just here to tell how much huge pages a PostgreSQL instance will need to allocate, it's just a helper to tune the system.

Krysztophe commented 1 year ago

You-re right, shared_memory_size_in_huge_pages is the desired size, not what is in HP.

rjuju commented 1 year ago

FTR upstream finally decided to properly report whether huge pages are used or not, see https://github.com/postgres/postgres/commit/a14354cac0e32d5e169c1ea4225845f93922d483.

We will be able to write a proper check for that starting with pg 17.

Krysztophe commented 2 weeks ago

That service should be possible with PG17 : https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=a14354cac

rjuju commented 2 weeks ago

Isn't that the exact same commit I mentioned a year ago?