FoldingAtHome / fah-issues

49 stars 9 forks source link

FAH could have an option to only get small work units to reduce GPU load. #1527

Open informatorius opened 4 years ago

informatorius commented 4 years ago

Is your feature request related to a problem?

Some users complain about GPU running 100% which can slowdown other apps slightly.

Describe the Feature

FAH could have an option to only get small work units with very low atom count to reduce GPU load. This could even be put in the UI, so the performance slider would on medium settings only get small work units for the GPU. Would be especially useful also for new users who would quit FAH again because of other PC apps like internet browser feel slower.

Context

With small work units with very low atom count the GPU load on medium to fast GPUs is reduced to e.g. 80% instead of 100% The FahClient would need an option to get low atom count work units. The UI performance slider would set this option too. The assignment server would see this option of client and then deliver work units to the GPU which were thought to run on slower GPUs than current.

e.g. nvidia RTX 2080 would then not get a work unit with high atom count but only a work unit with low atom count which was typically assigned for a RTX 2060. And a RTX 2060 woulde get a work unit typically assigned for a gtx 1060.

Could also have 4 levels like high, medium, low, very low which stands for currently high=150k atoms count, med=65k atoms count, low=25k atoms count. very low=5k atoms count.

Give users options so they can run FAH how they like. Benefit: More FAH users stay folding.

Assignment servers logic should be easy to adapt to this new FAH property. It is just a decision table about GPU species properties and FAH user options and project work unit atom count.

(Another user flag would be "part time folding" where users shutdown their PC at night. They want to get work units which have a longer timeout or deadline. But that is another feature request)

uyaem commented 4 years ago

With the FAHCore_xx processes running at minimum priority, there should not be any noticeable effect on using the PC (with exceptions). Personally I can even run games, and other than a loss of PPD I don't see any impact.

From what I understand, the atom count is not the only factor in the hardware requirements, the experiments that are run also have an impact. I fear that this would mean serious micro-managing on a per-project-and-gpu level.

I don't want to say that this wouldn't be a cool feature, but with the limited development resources I'd consider it "nice-to-have"; there are easy workaround for times when you need to use the PC (right-click, pause).

ghost commented 4 years ago

How about adding to the client a small benchmark function, that would run automatically each time a slot is created or modified? A few seconds would probably be enough for deciding on solid ground which kind of WUs should best be assigned to the slot, or not.

gchernis commented 4 years ago

This particular user complains that Excel becomes so laggy it's nearly unusable with FaH going

ghost commented 4 years ago

There is a known problem when running GPU WUs on the graphics card that also controls the user's monitor, causing the lagging. It is a hardware problem - it is not possible to modulate the output of a GPU, it is all or nothing. So if the graphics card is not powerful enough, the phenomenon can become problematic indeed. In such cases, it is preferable to pause the WU during periods of intensive work (with the computer).

gchernis commented 4 years ago

@ajmch Consider this: if a work unit is matched to a PC so that it cannot be split to all compute engines, this would alleviate the problem. User-facing PCs would benefit from such an option.

ghost commented 4 years ago

Yes agreed! I think that no-one is doubting the necessity or usefulness of such a matching. It's only a question of priorities, as I understood.

gchernis commented 4 years ago

@ajmch As I understand, donor counts are dropping daily. Would it be useful to consistently retain donors who actively use their PCs, not just donors with headless PCs? For context, this project went from 160K active donors to 80K in a matter of months.

ghost commented 4 years ago

It certainly would, but the question is rather: is such an effort feasible and/or sustainable with present human resources allocated to the project? And if not, how to get the needed resources? Since the COVID crisis, it has been a simple rush to make ends meet. And on the other hand, people routinely juggling with headless computers and VM are savvy enough for ironing out the problems by themselves, whereas standard users require much more care. I don't know. For my part, I just fold and hope that it will pan out, that users will get the upper hand, because they certainly are the largest permanent resource pool for such a project, nowadays.

bb30994 commented 4 years ago

With the FAHCore_xx processes running at minimum priority, there should not be any noticeable effect on using the PC (with exceptions). Personally I can even run games, and other than a loss of PPD I don't see any impact.

This is a true statement when talking about FAH being processed by the CPU. (FAHCore_A7 / _A8) This is not true when talking about WUs that are being processed by a a GPU. For the most part, a process assigned to the GPU (a "kernel") can delay other processes that schedule work on the same GPU. If the screen updates do not occur promptly, CPU priority doesn't change anything. Screen lag epends on the generation of the GPU. For NVidia, the Pascal GPU was the first one that introduced interruptible kernels. I'm not sure about AMD GPUs.

PantherX commented 4 years ago

FYI, there are plans to move towards an automated system where GPUs are benchmarked and allocated the WU that's most efficient to them. That means if you have a low end GPU (since the issue of screen lag is mostly prominent on), you will not be assigned large WUs, only small ones. That should make it less likely to occur but let's see what happens 😃