NagiosEnterprises / ncpa

Nagios Cross-Platform Agent
Other
177 stars 95 forks source link

NCPA 2.4.0 reporting only 32 cores on a 48 core system. #864

Open kfnagios opened 2 years ago

kfnagios commented 2 years ago

A Customer observed that NCPA for windows was only reporting 32 cores on a 48 core system. The problem appears to be specific to the windows version and the customer was not able to replicate the problem with the Linux client Additional details are available internally via BR-15740.

MrPippin66 commented 2 years ago

Do you have the ability to run a native python program on the source system having issues?

Namely, need to see what the output "os.cpu_count()" is showing.

HunnyPuns commented 2 years ago

I've got a Windows Server 2019 VM spun up with 44 CPUs. NCPA is seeing 32 cores. If I install Python 3, os.cpu_count() gives me 44. If I install Python 2, os doesn't seem to have cpu_count()?? But I can use multiprocessing.cpu_count() and that returns 44 as well.

jomann09 commented 2 years ago

cpu stats/count is coming from psutils and I am not 100% certain how they are getting it, but it could be from a windows counter or something that python reads from rather than something like the multiprocessing module.

MrPippin66 commented 2 years ago

Should be coming from the "GetLogicalProcessorInformationEx()" windows call, but I wanted to make sure it's not a fundemental issue in Python, itself.

I don't have any system with that many cores.

I don't have access to BR-15740, but would be useful to know the hardware details of the system that is on ( which I assume is a physical system, not virtual).

HunnyPuns commented 2 years ago

In my case it's a virtual system. 2 sockets, 22 cores per socket. I don't have the option to specify threads per socket. At least I haven't found that option in VMWare anyway. I can do it in Proxmox, but I can't over provision CPUs, and I don't have a beefy Proxmox system laying around. :( But I wouldn't think it would be an issue of it not looking at logical CPUs, since 32 doesn't fit nicely into 44 or 48.

I went back to an older version of psutil, 5.6.7, and did psutil.cpu_count() and it returned 44.

MrPippin66 commented 2 years ago

@HunnyPuns Which python version are you using?

HunnyPuns commented 2 years ago

Oh sorry, I forgot. Python 2.7.18.

MrPippin66 commented 2 years ago

Hmmm, What psutil version where you using that was broken? I assume same python version.

HunnyPuns commented 2 years ago

None. Running psutil (tested with latest, and v5.6.7) always gets me the correct number of CPUs, which is 44. NCPA returns 32 CPUs on the 44 CPU system. All of this is tested with Python 2.7.18 on Windows Server 2019.

MrPippin66 commented 2 years ago

Ah! Okay, so you don't see a problem with the values returned via psutil. Just what ncpa returns? And to clarify, which node are you querying, just to make sure?

HunnyPuns commented 2 years ago

Right, I am only seeing incorrect values in NCPA. I am querying api/cpu/count. Also, if I look at api/cpu/idle, I get 32 records returned, which is what I would expect given the count, but sometimes it's nice to get metrics from multiple angles.

MrPippin66 commented 2 years ago

@HunnyPuns Can you get the following from psutils with the current psutils under python 2.7?

ps.cpu_percent(percpu=True)

NCPA's cpu/count get's that from the length of the returned array.

HunnyPuns commented 2 years ago

In both psutil 5.6.7 and latest, psutil.cpu_percent(percpu=True) gives me an array with 44 (the number of CPUs on the VM) values in it.

MrPippin66 commented 2 years ago

Thanks. One of the code developers will have to look into this, since this implies the problem is not coming from psutil.