gigabit-clowns / xmipp4-core

Core library of xmipp4
https://gigabit-clowns.github.io/xmipp4-core/
GNU General Public License v3.0
3 stars 0 forks source link

Created backend_priority #98

Closed oierlauzi closed 5 days ago

oierlauzi commented 5 days ago

Backends provide a priority value to allow breaking ties when multiple backends are available. Also fixing https://github.com/gigabit-clowns/xmipp4-core/issues/97

MartinSalinas98 commented 5 days ago

I'm not sure how I feel about the implementation-faith driven priority criteria. I think it's not very objective, as it will usually rely on the developer's personal feel over their implementation, sometimes overestimating it, others, the opposite.

In a perfect scenario where all the backends have perfect implementations (therefore, +1 priority for all of them), which one is chosen? Maybe each machine with xmipp4 installed might have its own preferences regarding backend usage, which should be defined (probably a configuration file?)

Additionally, this brings a new rabbit hole. Let's say I have an Nvidia and an AMD graphics card in my system (weird, we are supporting it, so it's a possible case). I set in my config file the Nvidia card as my preferred, and I start some processes on it, keeping it for some hours, let's say at 85% usage with 80% memory occupied. This graphics card will be busy for a while, but the AMD one is still free and I should be able to run jobs on it, so i run a new job.

The way it is implemented, or at least designed for now, this would always try to run on the preferred backend (cuda) and never attempt executions on vulkan, so my second graphics card would be useless even if i can list it.

oierlauzi commented 5 days ago

This is mainly to prevent selecting a fallback implementation over "real" implementations. In general, implementations should use backend_priority::normal unless there is a reason to boost/lower their priority. In a tie, the selection is arbitrary. This method is meant to be coarse grained (e.g. affects any backend and not particularly device_backend).

oierlauzi commented 5 days ago

The case you describe above should be dealt by the user. The user specifies which device(s) a particular job runs on. Therefore its is his/her problem to not oversubscribe devices(s). SLURM could also help here. In any case, the backend_priority is not meant to resolve such scenarios.

sonarcloud[bot] commented 5 days ago

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud