BOINC / boinc

Open-source software for volunteer computing and grid computing.
https://boinc.berkeley.edu
GNU Lesser General Public License v3.0
2.03k stars 449 forks source link

Work Fetch - Leaves part of a GPU unused, when it should instead fetch work #1211

Closed romw closed 2 years ago

romw commented 9 years ago

Reported by JacobKlein on 25 Mar 43272124 22:13 UTC If a project's GPU apps are setup to use only part of the GPU (ie: app_config.xml), then when the last remaining task(s) for that project are running and not utilizing the full GPU, work fetch should fetch more, but doesn't.

This issue was confirmed with both 7.0.60, as well as on 4/8/2013 in the simulator (which has several unreleased work fetch changes).

It would seem that the prerequisites to reproducing the bug are:

I'm not certain if GPU Exclusions are necessary to create the issue, but I believe that using GPU Exclusions makes this problem worse.

As a workaround, I had to increase my buffer settings way above what I would normally expect. It feels like, in addition to work fetch not realizing a portion of the GPU is idle, it might also not be realizing that the tasks run 2-at-a-time.

Details, including examples in a simulation, are in the email below:


From: jacob_w_klein@msn.com[davea@ssl.berkeley.edu[BR]Subject: RE: job scheduling[Mon, 8 Apr 2013 09:51:16 -0400[BR][ Thank you. I really appreciate you looking at these issues, and I'll try to verify they work.[BR]Your WCG project sounds interesting; maybe they're going to support Android?[wish we had a Windows Phone platform, I'd love to test on it.[BR][you remember Ed (Beyond) reporting a GPU Exclusion Work Fetch issue?[BR]I might have found examples of what he was trying to explain...[noticing an issue, both on my computer (7.0.60's work fetch algorithm), as well as the simulator (new work fetch algorithm).[BR]If a GPU is only partially-loaded (ie: 0.5 GPU) by the last remaining task(s) for a project that has GPU-Exclusions,[get into a scenario where GPUs are left part-idle, and work fetch won't fetch more.[BR][task scheduler (correctly) schedules the workload, which is scheduled in a way where a GPU is left part-idle,[BR]But work fetch thinks we have plenty of work, and sees no fully idle instances, so it doesn't ask for any.[are some examples where that occurred, even with our work fetch changes:[BR][[BR]2 days 17:03:00[days 14:33:00[BR]6 days 06:13:00[days 16:43:00[BR]9 days 16:07:00[fix might involve evaluating the project's GPU apps to see if it has any that use partial GPU[BR]... or maybe checking to see that all of its GPU apps use <= amount of currently idle GPU (to ensure we don't keep asking/getting work we cannot immediately use)[sounds to me like the fix for this one might be tricky instead of straight-forward, though I'm not sure.[BR]Do you plan on tackling this soon (fixed in short term), or should I create a ticket (fixed eventually, maybe months/years)?[[BR]][[BR]]Regards,[[BR]]Jacob

Migrated-From: http://boinc.berkeley.edu/trac/ticket/1239

romw commented 9 years ago

Commented by JacobKlein on 18 Jan 43272214 15:09 UTC Another example where this happened in the simulator, which is in fact much simpler, can be found here:[[BR]]http://boinc.berkeley.edu/dev/sim_web.php?action=show_scenario&name=94

At the beginning, we should have fetched more work from GPUGrid.net, but we don't, and erroneously leave half the GPU idle.

Ageless93 commented 7 years ago

Is this still a problem with BOINC 7.8.3?

AenBleidd commented 6 years ago

Up

AenBleidd commented 2 years ago

I'm closing this for now as 'wontfix' since there was no any activity/additional requests last 5 years. If this is still an issue - please reopen this ticket.