BOINC / boinc

Open-source software for volunteer computing and grid computing.
https://boinc.berkeley.edu
GNU Lesser General Public License v3.0
1.95k stars 439 forks source link

client: if GPU is missing, discard app versions and results that refer to it. #5577

Closed davidpanderson closed 2 months ago

davidpanderson commented 2 months ago

This addresses an issue introduced when we changed GPU names from (e.g.) 'Apple M3 Pro' to 'apple_gpu'.

Note: this means that if a host has in-progress jobs using a discrete GPU, and you remove that GPU, those jobs will be discarded. This is a change, which I think is good because otherwise you'd get error messages forever.

AenBleidd commented 2 months ago

@davidpanderson, I think this is quite a dangerous change. Please see my email for more details.

RichardHaselgrove commented 2 months ago

I agree with Vitalii - this change needs, at the minimum, more thought. One significant problem: with modern versions of Windows, an operating system update and reboot can take place "outside working hours" without the explicit approval of the user. The sequence of component restarts can be ill-defined: it is possible for BOINC to restart before the video drivers have loaded and are ready for use. In that case, the GPUs are logged as being missing - but it's only a transient situation. Shutting down the client and restarting it resolves the problem.

Discarding the task is wasteful of project resources and the user's internet bandwidth.

AenBleidd commented 2 months ago

I'm making this PR a draft to discuss this change.

CharlieFenton commented 2 months ago

Instead of this why not just allow the old Apple GPU (or whatever it was) as an alias for the new name? That would provide backward compatibility.

CharlieFenton commented 2 months ago

… or accept any GPU name containing Apple

davidpanderson commented 2 months ago

It doesn't matter if a few jobs get discarded.

davidpanderson commented 2 months ago

Oops! I fixed the messages in each case