FoldingAtHome / fah-issues

49 stars 9 forks source link

Cannot increase CPU threads until you get a new WU assignment. #1410

Open bb30994 opened 4 years ago

bb30994 commented 4 years ago

This computer has 8 CPU threads and 2 GPUs, so the default "full" configuration is CPU:6. For undisclosed reasons, I set it to CPU:5 overnight which was promptly reduced to CPU:4. This morning I reset it to CPU:6 but it was promptly reduced to 5 and then to 4.

The key to this strange behavior is that during the night, my CPU finished a WU and was assigned a new one. Any WU which is downloaded while running at CPU:5 cannot be increased to CPU:6 even though that would actually be the default setting if it had been set to CPU:(-1). By setting it to 6, even though this WU will continue to run a 4 will insure that the NEXT assignment can be run at 6.

shorttack commented 4 years ago

For CPU Folding you can choose these thread counts to use: 2,3,4,6,8,9,10,12,15,16,18,20,21,24 (not all thread count numbers work From @Hou5e at #fah-client at github.com/foldingathome

The core usually work best with powers of 2 : 2, 4, 8 , 16, ... Basically, you need to avoid "big" odd prime numbers 5, 7, 11, ... and their multiples 10, 14, 15, ... 12 will work too (223), surprisingly, 21 is usually doing pretty well ... From @toTOW

@bb30994 I'll set this as a defect, but it's a known restriction that the developers can explain better to you.

bb30994 commented 4 years ago

Yes, I know that. but that has nothing to do with the problem I'm reporting.

Ignore any mention I made of 5 because it's always corrected to be 4.

I reduced it from 6 to 4 (effectively, since 5 results in 4 anyway). Then the next morning increasing it back to 6 was ineffective for that particular WU because it had been delivered during the night when the setting was still 5 (or 4). FAHCLIENT insisted on it being permanently stuck at 4 with two idle threads. When the next WU assigned, the new was accepted as a WU that was allowed to use as many as 6.

bb30994 commented 4 years ago

In the referenced ticket #1261 the issue is more serious. The first WU delivered to a freshly installed client is apparently delivered while 1 CPU thread is in effect. That prevents anyone from folding that first WU with more than 1 thread. Specifically, 2,3,4,6,8,9,... up to the total number the CPU supports are all prohibited for that WU and they would still be prohibited for future WUs if you were to re-set the number of CPUs to 1 after the self-configuration code increases it based on the (-1) setting.

anand-bhat commented 4 years ago

I've observed that when a WU is download, there is an entry made for the CPU count for the WU table of client.db. When the first WU is downloaded, the CPU count is reported as 1 (issue 1261) and the WU entry is saved with 1 CPU.

The client validates the CPU count when starting on the WU and only allows number of CPUs = min(cores in slot config, WU CPU count). So even if the number of CPUs is increased in the slot config, it does not use CPU > 1. The line "WARNING:WU00:FS00:AS lowered CPUs from 4 to 1" is printed (assuming CPUs was upped to 4). The "AS lowered" part is a bit of a red herring.

The only way to fix this at the moment and use the correct number of CPUs for that WU is to edit the clients.db and update the WU entry with the correct number of CPUs.

gchernis commented 4 years ago

To me, this sounds like an elaborate feature under water, but no tip of the iceberg to warn user. At the very least, GUI enhancements needed.

bb30994 commented 4 years ago

Example: A WU was downloaded with 2 threads active. later, I manually increased it to 3. FAHCore_a7 was interrupted to distribute the config change to 3 threads as evidenced by the "can cause some work units to fail" message but the restart still didn't didn't resume with -nt 3 until the next WU.

This would have worked if the WU had been downloaded with 3 threads and then it had been temporarily reduced to 2.

19:59:42:WU01:FS01:0xa7:Exiting, please wait. . . 19:59:44:WU01:FS01:FahCore returned: INTERRUPTED (102 = 0x66) 19:59:45:WU01:FS01:Starting 19:59:45:WARNING:WU01:FS01:Changed SMP threads from 2 to 3 this can cause some work units to fail 19:59:45:WARNING:WU01:FS01:AS lowered CPUs from 3 to 2 19:59:45:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\bruce\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/win/64bit-sse2/a7-0.0.19/Core_a7.fah/FahCore_a7.exe -dir 01 -suffix 01 -version 706 -lifeline 5292 -checkpoint 15 -np 2 19:59:45:WU01:FS01:Started FahCore on PID 7920