SilverAzide / Gadgets

Gadgets for Rainmeter
Other
365 stars 12 forks source link

CPU Meter No Longer Able to See Utilization On 1950X in NUMA Mode #23

Closed RacerXNFS closed 3 years ago

RacerXNFS commented 3 years ago

After updating from 5.5.1 to 6.0.2, CPU Meter might have lost the ability to measure multiple NUMA nodes? Under Distributed Memory/UMA mode the charts work just fine, cores 1-32 all show load. Under Local Memory/NUMA mode I lose the ability to see any utilization of the second CPU die, and I can only see activity for cores 1-16.

image

SilverAzide commented 3 years ago

Interesting... Yes, there might be a problem. Sorry about that! I am not familiar with NUMA nodes/modes and how that works with respect to how the Performance Counters will be enumerated. I made a change to which counters are used in the v6.x Gadgets to resolve issues with both CPU utilization as well as a known limitation with processors with more than 64 threads. Apparently, there are other cases like yours which I didn't account for. I don't have access to such CPUs, so I can't test these scenarios.

If you could assist, I'd appreciate it! Can you tell me what the values of the following measures are in both the modes you described? MeasureCPULogicalCores MeasureCPUPhysicalCores MeasureCPUPhysicalCPUs MeasureCPUThreadsPerCore

Next, in the Rainmeter About > Skins tab > All CPU Meter, the measures MeasureCPU1 through MeasureCPU31, the string value of the measure appears like "0,0", "0,1", etc. These values are supposed to match the Performance counter instances. If you open the Performance Monitor application and go to the dialog where you add counters, under the "Processor Information" category and "% Processor Utility" counter, you will see all the instances for your 32 logical CPUs (in addition to total values). Do each of your logical cores in PerfMon have values that match those in the CPU Meter? Do you have "0,Total" and "1,Total" counters, or just a single "0,Total" counter, and does this change depending on the modes? What I would expect is that your counter instances would be "0,0" to 0,15", then "1,0" to "1,15" (or maybe "1,16" to "1,31"?).

Does toggling the "Use Legacy Mode" option make the 16-32 cores active, or does it have no effect?

There is a fallback we can do to force the CPU Meter to act exactly like it did in the v5.5.1 version, but we'll try that as a last resort...

P.S.: And thank you for the feedback!

RacerXNFS commented 3 years ago

Legacy mode doesn't fix or break anything in either mode here.

UMA image

NUMA image

SilverAzide commented 3 years ago

Thank you! Interesting that both modes show the same physical configuration, but the measures show zeroes for some in the second. Hm!

Would you be able to open the Performance Monitor app in the Admin Tools control panel, then in the Add Counters dialog, select the Processor Information category and the % Processor Utility counter. In list of counter instances, you should see instances "0,0" through... something. Do you see "0,0" through "0,31" here? Or do you see something else? If you see "0,15" through "0,31", could you select them and see if they report non-zero values? This is where the CPU Meter is getting the data.

If Performance Monitor is reporting no activity, this would be why CPU Meter is showing nothing as well. If you are getting zeros, then perhaps your PerfMon database is corrupted. You can try these instructions to reset your database.

Let me know how this goes. If your counter instance identifiers are correct, and the database is good, and it is reporting zeros for cores 16-32, then perhaps there is some glitch between AMD CPUs and Windows (this would not be the first time). I can give you a simple fix (I hope!!) that will fix your CPU Meter to make it work like the v5.x version.

RacerXNFS commented 3 years ago

I did fix the performance counters before bringing this up, and I think I see the culprit now. Here it's changing depending on what mode the PC's in. UMA, it's 0,0-0,31, NUMA, it's 0,0-0,15, then 1,0-1,15. It's reporting accurate, non-zero values as well.

SilverAzide commented 3 years ago

Excellent info! Thank you for getting back to me. This is what I was suspecting. It sounds like UMA mode is acting like a single socket CPU with 32 logical cores (1x16C/32T), while NUMA mode is acting like a dual socket CPU with 16 logical cores each (2x8C/16T).

I'm guessing that Windows (WMI) is reporting a single CPU regardless of the mode (i.e., the MeasureCPUPhysicalCores measure is returning 1 all the time). This is causing the CPU Meter to incorrectly enumerate the counters.

Could you do me a favor and run a few commands for me in each mode? I might need to make a coding change to properly detect these modes. At a command prompt, enter the following commands:

wmic computersystem get NumberOfProcessors
wmic cpu get DeviceID, NumberOfCores, NumberOfLogicalProcessors, ThreadCount

If things were consistent, in UMA mode the first command should report NumberOfProcessors=1 and the second command should report a single DeviceID with NumberOfCores=16, NumberOfLogicalProcessors=32. In NUMA mode the first command should show NumberOfProcessors=2 while the second command should list two DeviceID values (CPU0, CPU1) with 8C/16T each. I'm guessing this will not be the case...?

Depending on your feedback, I can hopefully alter the code to better detect your system configuration.

SilverAzide commented 3 years ago

(Hopefully) fixed in v6..1.0.

RacerXNFS commented 3 years ago

Problem wasn't fixed in 6.1.0, sadly. Sorry for the delays in responding.

I just ran the commands you suggested and no, the PC still recognizes it here as a single CPU. Picture taken running in NUMA.

image

SilverAzide commented 3 years ago

Ugh... well, that was a wasted release. 😢 There is no way I know of to detect the number of vCPUs in NUMA mode. Somehow Windows PerfMon knows, but how it knows is a mystery to me. I've read about some SysInternals tool that can describe all sorts of info about the various modes, but I don't know of a way with Win32 APIs, WMI, or other things easily available from Rainmeter or plugin environments.

Here are two different ways you can create work-arounds to fix your CPU Meter to make NUMA mode work.

The first way is to hardwire the number of physical CPUs to 2. You could use the stock CPU Meter as a UMA-mode skin and create a NUMA-mode skin if you switch modes often. To do this, find the measure MeasureCPUPhysicalCPUs and replace it with this:

[MeasureCPUPhysicalCPUs]
Measure=Calc
Formula=2
UpdateDivider=-1

The code should properly enumerate the performance counters this way. This will allow you to use the improved performance counter category ("Processor Information") used in Window 10. Edit: This will not work if using HWiNFO. If not monitoring temps (or using CoreTemp/SpeedFan), then it will be OK.

The second way is to hardwire the Windows version so the skin thinks you are running Windows 7. This will cause the skin to run exactly like it did in v5.5.1, and both UMA/NUMA modes will be shown identically. To do this, find the measure MeasureOSVersion and replace it with this:

[MeasureOSVersion]
Measure=Calc
Formula=6.1
UpdateDivider=-1

Sorry for the problems. And thanks for getting back to me!

SilverAzide commented 3 years ago

Because I have no way to test NUMA modes, to resolve this issue would require me to keep asking you question after question trying different things. I don't really want to pester you with testing various commands, you've been very helpful but I don't want to bother you. Unless you want to help, I might need to close this issue and just acknowledge it as a bug I can't fix, maybe add some sort of option to force the "old-style" of calculating core usage. Even if I DO find a way to count vCPUs, I can't confirm it works without help from someone like yourself.

I'm open to suggestions on how to proceed.

SilverAzide commented 3 years ago

OK, more info... I found a Win32 API called GetLogicalProcessorInformation that can tell you all sorts of info about processors down to which logical core is associated with which physical cores, and which are associated with NUMA nodes and so forth. It is pretty gnarly code, but I think I can make it work. MS SysInternals has a utility called CoreInfo which dumps this API data out in a format which is understandable. My 4C/8T CPU looks like this:

Logical to Physical Processor Map:
**------  Physical Processor 0 (Hyperthreaded)
--**----  Physical Processor 1 (Hyperthreaded)
----**--  Physical Processor 2 (Hyperthreaded)
------**  Physical Processor 3 (Hyperthreaded)

Logical Processor to Socket Map:
********  Socket 0

Logical Processor to NUMA Node Map:
********  NUMA Node 0

No NUMA nodes.

Logical Processor to Cache Map:
**------  Data Cache          0, Level 1,   32 KB, Assoc   8, LineSize  64
**------  Instruction Cache   0, Level 1,   32 KB, Assoc   8, LineSize  64
**------  Unified Cache       0, Level 2,  256 KB, Assoc   4, LineSize  64
********  Unified Cache       1, Level 3,    6 MB, Assoc  12, LineSize  64
--**----  Data Cache          1, Level 1,   32 KB, Assoc   8, LineSize  64
--**----  Instruction Cache   1, Level 1,   32 KB, Assoc   8, LineSize  64
--**----  Unified Cache       2, Level 2,  256 KB, Assoc   4, LineSize  64
----**--  Data Cache          2, Level 1,   32 KB, Assoc   8, LineSize  64
----**--  Instruction Cache   2, Level 1,   32 KB, Assoc   8, LineSize  64
----**--  Unified Cache       3, Level 2,  256 KB, Assoc   4, LineSize  64
------**  Data Cache          3, Level 1,   32 KB, Assoc   8, LineSize  64
------**  Instruction Cache   3, Level 1,   32 KB, Assoc   8, LineSize  64
------**  Unified Cache       4, Level 2,  256 KB, Assoc   4, LineSize  64

Logical Processor to Group Map:
********  Group 0

So it looks like if I can twiddle some bits properly, I can figure out how many NUMA nodes there are, how many sockets, etc.

SilverAzide commented 3 years ago

This issue SHOULD be fixed in the next release. I figured out how to enumerate NUMA nodes (which is what the PerfMon app is needing). If anyone wants to beta test the fix for me, let me know. You'll need a CPU/machine that is running in NUMA mode.

RacerXNFS commented 3 years ago

Hey, I wanted to apologize for not being around as often as I should regarding this issue. I should be around a bit more over the next coming days and would be more than happy to test the fix for you.

SilverAzide commented 3 years ago

GadgetsPatch610.zip

Thank you for your time and assistance on this! I have no way to test this myself, so I am grateful for any help you can provide.

The attached .zip has 3 files in it that you will need to patch your Gadgets. Please do the following steps to apply the patch:

  1. Exit Rainmeter (required to do the next step).
  2. Copy the ActiveNet.dll file from the zip to your Rainmeter plugins folder, which is typically C:\Users\<username>\AppData\Roaming\Rainmeter\Plugins.
  3. Copy the CpuMeter.lua file to your Rainmeter\Skins\Gadgets\@Resources folder.
  4. Copy the All CPU Meter.ini file to your Rainmeter\Skins\Gadgets\All CPU Meter folder.
  5. Restart Rainmeter.

If your machine is in UMA mode, then you should not see any differences from before (hopefully!). On the About screen for the All CPU Meter, you should see a new measure MeasureCPUNumaNodes. The value should be 1. The MeasureCPUxx measures should be numbered "0,0" to "0,31".

Screenshot 2021-05-19 091802

If your machine in is NUMA mode, then you should (hopefully!) see that MeasureCPUNumaNodes is 2 (per your 1950X). The MeasureCPUxx measures should be numbered "0,0" to "0,15" then "1,0" to 1,15". You should also see the CPU Meter correctly showing activity on all 32 cores.

I hope this works! 😄 The code needed to count NUMA nodes is pretty nuts; I'm not sure why Microsoft has not made this much simpler, like by adding it to WMI. But the upside to that code is that you can actually see precisely how your logical processors are mapped to your physical CPU(s). Also, I made the assumption that you can't (or wouldn't) have something asymmetrical, like two physical CPUs with different core counts, or NUMA nodes with different core counts.

RacerXNFS commented 3 years ago

Got some good news for you!

UMA image

NUMA image

The whole thing's reminded me just how wacky the first two generations of Threadripper were designed. For a set of Rainmeter gauges this very much feels like a weird edge case, so I'm really grateful for the additional help and support.😅 Glad I was at least able to help out a bit.

SilverAzide commented 3 years ago

WOOHOO! 🥳 🍾

Great news! Thanks for the assist on this! 👍 Yes, you're right, this is an edge case for sure, LOL. This fix will be in the next release.

SilverAzide commented 3 years ago

Sorry to ping you via this old Github issue.

I'm looking for beta testers for major skin enhancement. I've made a major change to the CPU Meter to allow it to recognize an unlimited number of sockets, CPUs, and logical cores, and also work with NUMA mode (all automatically). It should properly display your RAM values in NUMA mode as well (RAM per node). In your case, you'd either have a single CPU Meter showing 32 cores (UMA mode) or 2 CPU Meters showing 16 cores per NUMA node. Works with 64-core Threadripper PRO and EPYC CPUs too.

Let me know if you are interested!

RacerXNFS commented 3 years ago

I'll be more than happy to take a look, I'm definitely curious. The per-node RAM sounds like an especially useful addition!

SilverAzide commented 3 years ago

Thank you so much for your assistance!

The attached .rmskin is a patch, so it will replace your existing All CPU Meter but leave everything else intact (assuming I did not mess up the package). It can also install as a stand-alone skin.

If all works as planned, once the install completes, it should automatically invoke the "Node #0" skin (assuming the CPU Meter was running). You can select up to 4 nodes or CPUs (if the nodes/CPUs don't exist, you'll just see zeros for all the values). More nodes/CPUs can be supported by simply creating additional folders and copying the skin files -- there's no limit! In these files, the NumaIndex value sets the node/CPU index for that skin to monitor.

These skins depend on the fact (as best as I understand the MS docs) that the maximum logical processors Windows can handle per node/socket is 64. More than that, and they are ignored unless you are in NUMA mode. Each NUMA node supports up to 64 cores max, but can be split up however your CPU supports (I assume how you do this is based on the RAM in your machine and how you want to optimize). Multi-socket machines act like NUMA nodes, with the same rules. What I cannot figure out is if a multi-socket machine can run in UMA mode. If so, some of my logic may be broken, but as far as I can tell, this is not possible. For example, putting a 64C/128T Threadripper PRO in UMA mode will disable half the chip, since Windows can't see that many logical cores at once.

Additional things to check... I am under the impression that Windows swap files are per machine. So the swap space values will show the same values for all skins. (I read something that said they are internally split up per node/CPU, but I can't find anything else on the matter, nor any way to query this.) The RAM and clock speed and overall % usage values are per-node/per-CPU, not for the total machine. (Hover the mouse over the CPU icon and it will tell you total installed RAM.)

It was bit of a nightmare to try to get HWiNFO mapped up right (it is seeing physical cores/CPUs, while Windows deals with logical cores/nodes). Hopefully the mapping logic is correct.

I know you have a single-socket machine, so you can't test everything. I have no clue if SpeedFan or CoreTemp can see multiple sockets, or more than 64 physical cores. My assumption is they just sequentially number every core, including across CPUs. I have not worked on this code yet, so these probably won't work. I have also not added any code to handle more than one CPU fan (yet).

I've faked up a dual 64C/128T machine (4 nodes) and I know that above about core 100 the text and % usage display values start to overlap, as the gadget is too narrow to hold all this info. I'm not sure the fix for this yet; I may have to change the length of the bars, which I'd rather not do. My main goal is for the 99.9% of users to not realize anything has changed when they upgrade. 😃

Please correct me if any of the above info is incorrect! Any additional comments/suggestions are welcome! Thanks!

RacerXNFS commented 3 years ago

I'm not really much of a programmer or professional so I can't exactly answer or fact-check a lot of what's being said here. I also can't validate SpeedFan as it doesn't work on Ryzen-based systems.

At the very least I can say that with HWiNFO values plugged in and while running a quick test (a few laps of Wreckfest running in the background), everything at least appears to be working as intended. If there's anything specific you need me to test, let me know and I'll do my best as time permits.

image

SilverAzide commented 3 years ago

This is great! Love the screenshot too! 😆 You can really see you are giving Node 1 a workout while Node 0 is coasting along. Looks like the RAM/node thing is working as well.

Thanks a bunch for the feedback! My plan is to release a new version of the Gadgets (with this enhancement included) once Rainmeter 4.4.0 "final" is released, which hopefully will be soon (according to the devs).

RacerXNFS commented 3 years ago

Happy to have helped, been using the beta ever since without issue and just updated to the latest release. Even had a moment where my BIOS reset (which puts the CPU back into UMA/single-node mode) and CPU meter loaded as a single node just fine. It's all working perfectly. Great work. 👍

SilverAzide commented 3 years ago

Thank you for your help, it was invaluable! If you can send a screenshot of your CPU Meter in regular UMA mode, that would be great to have too. I cropped out the one above and posted it here on the Rainmeter forum; hope you don't mind. It was too cool not to share, LOL.

And let me know if you ever decide to upgrade to dual-socket Threadrippers! 😆

SilverAzide commented 2 years ago

Hello @RacerXNFS, I'm sorry to bother you again, I hope I'm not disturbing you! Would you be able to test another iteration of the CPU Meter logic with your Threadripper? I've been made aware that with a Threadripper Pro (64-Core), the fixes made in this issue here still are not completely working properly. It is somewhat of a nightmare scenario (for me anyway), especially since I can't test anything I do.

Would you be able to run this small executable to dump out your CPU info? I just want to make sure the logic is still valid for your machine:. If so, the issue and executable is here: https://github.com/SilverAzide/Gadgets/issues/30#issuecomment-991822731

With your machine in NUMA mode, I'm hoping to see your 2 nodes with 16 cores each, and in UMA mode it should be 1 node with 32 cores. What I'm also not sure about are the core numbers in NUMA mode (i.e., 0-15 and 0-15, or 0-15 and 16-32).

Thank you!

RacerXNFS commented 2 years ago

Hey, no worries. Sorry for the delayed response, hopefully this is still useful.

UMA mode

ConsoleActiveNet v1.0.0.0 Copyright c 2021 SilverAzide All rights reserved.

ConsoleActiveNet run at 12/14/2021 11:39:25 PM.

Node = 0 Group = 0 Processors = 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31

Node = 1 Group = 1 Processors =

NUMA Mode

ConsoleActiveNet v1.0.0.0 Copyright c 2021 SilverAzide All rights reserved.

ConsoleActiveNet run at 12/14/2021 11:45:35 PM.

Node = 0 Group = 0 Processors = 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15

Node = 1 Group = 0 Processors = 16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31

SilverAzide commented 2 years ago

Thank you very much!! Yes, this is super helpful. This is exactly what I was hoping I would see! Unfortunately (for me) with a 64-core Threadripper, the results are completely different and somewhat nonsensical. It is showing each node having processors 0-31 no matter what (whether 2 nodes or 4 nodes)... and it never shows the proper core numbering like yours does. Windows is wonky with his RAM as well, half his NUMA nodes show as having 0GB of RAM in Performance Monitor.

But I think (hope/pray) I have it all figured out. My goal is to somehow get the CPU Meter to correctly show his setup without breaking the way it shows yours, LOL.

RacerXNFS commented 2 years ago

Well, I'm seeing his Threadripper is a more recent Pro model? Not sure what use this information is, but these are weird chips so I've a little bit of background information for you.

The first two generations of Threadripper, 1000 and 2000 series, were basically just made by bolting up to 4 AMD Ryzen CPUs onto a single piece. It's like a dual-socket CPU system pretending to be a single-socket system, basically? That's why the motherboards let you toggle between UMA and NUMA modes, here it's the choice of single socket and a large pool of RAM or dual socket and two smaller, lower-latency pools at the expense of some thread scheduling weirdness as you can guess. Oh, and 2000 Series brought in 24- and 32-core monsters, which can't do UMA at all from what I understand?

3000 Series was a complete internal overhaul, and is basically much more like a giant Ryzen 3000/5000. A bunch of CCD chiplets connected to a single memory/IO controller that handles them all. So I don't think it even uses a NUMA-based memory system anymore since it's got a single memory controller instead of 2 that are basically duct-taped together? So the other nodes showing 0GB would be accurate because there's only one. Maybe it's reading batches of CCDs or something, not sure as I'm not a programmer.

Regardless I hope this was useful to you in some way, shape or form. Still surprised another Threadripper edge case managed to appear, but if that's not a sign of the versatility and usefulness of these gadgets then I don't know what is. Best of luck to you on this one, alright? 😁

SilverAzide commented 2 years ago

Excellent info! Thank you very much!

SilverAzide commented 2 years ago

P.S.: Here's a CPU Meter patch you can try if you are interested. If all is done properly, you should see no change in behavior on your system.

All CPU Meter Patch_7.1.3.zip

This version adds a mode to handle "processor groups" for systems with >64 threads. Not sure if this is the proper way to do it, but one of my end goals was to try to make sure it didn't break on your system.