BOINC / boinc

Open-source software for volunteer computing and grid computing.
https://boinc.berkeley.edu
GNU Lesser General Public License v3.0
2.03k stars 449 forks source link

Task displays as 'Running' without any progress #5895

Open AenBleidd opened 4 days ago

AenBleidd commented 4 days ago

Discussed in https://github.com/BOINC/boinc/discussions/5894

Originally posted by **homersimpsons** November 11, 2024 **Describe the bug** Sometimes a task displays as "Running" but does not get any progress (no elapsed times counter, no percentage, no process in task manager). **Steps To Reproduce** 1. Partiticipate to GPUGRID 2. Wait for an ATMML task 3. After the task as started for some times, pause it 4. Wait a bit and resume it 5. Here it should be stuck (but this does not always happen) Suspending / resuming it again does not change anything. I have to stop the boinc manager (and the daemon) ans restart it so the task will restart directly. **Expected behavior** The task should start correctly **Screenshots** A screenshot of this task row, it was stuck like this for 30 minutes, other tasks were processing fine. ![Image](https://github.com/user-attachments/assets/95263772-3358-4d94-94dc-530748d575e3) **System Information** - OS: Windows 11 - BOINC Version: 8.0.2 **Additional context** I reported this on GPUGRID a mont ago (https://www.gpugrid.net/forum_thread.php?id=5487). Maybe priority applications on GPU has something to do with the issue I tried to run a simulation with https://boinc.berkeley.edu/sim_web.php?action=show_scenario&name=212 but when I submit it I have an error page saying: > command failed (139): ./sim --duration 600 --delta 60 --rec_half_life 864000 --infile_prefix scenarios/212/ --outfile_prefix scenarios/212/simulations/3/ Maybe I should also open an issue for this. The simulation seems to be created but empty though.
homersimpsons commented 3 days ago

I had this again today, the order of events I remember was:

  1. Get a new task
  2. Suspend it
  3. Run a GPU priority application
  4. Close the GPU priority application (after ~2h)
  5. Resume the task
  6. BUG: (observe the "Running", but the elapsed time remains blank and no process started for 2 minutes)
  7. WORKAROUND: Close and restart the BOINC Manager
  8. The task starts directly

NOTE: To me this is a P: Minor, the workaround is rather easy. The "biggest" issue could be someone not noticing and just losing time up to the deadline.

davidpanderson commented 3 days ago

I don't understand the above. What does 'GPU priority application' mean in 3) and 4), and what do 'Run' and 'Close' mean?

Does the task in 1) need to be a GPU app?

In 6), are you talking about the same task as in 1)?

homersimpsons commented 2 days ago

What does 'GPU priority application' mean in 3) and 4),

I have the UI in french (I do not know how can I switch it to english), this is in this entry: Image

and what do 'Run' and 'Close' mean?

It means start the application defined in the above settings (in my case a game), this will of course stop any GPU computation, then close the application, here the GPU computations should start, but they do not.

Does the task in 1) need to be a GPU app?

I think so, but maybe it is possible that this works with a CPU application with a "CPU priority application" defined. I am not 100% sure about the reproduction because I know those are the steps I take but maybe there are other steps leading to the same result.

In 6), are you talking about the same task as in 1)?

Yes, the same boinc task. In my case it is most often an "ATMML" (GPUGRID) one, but I just reproduced the issue tonight with an "ACEMD 3" (GPUGRID) task.

I run Einstein@home too for GPU, but I set it to 0 resource share so any other available GPU task will run instead.

CharlieFenton commented 2 days ago

I think he is saying that he has set a game in the Exclusive Applications dialog to suspend the GPU when the game is running, but BONC does not resume a GPU task when the game is exited.

AenBleidd commented 2 days ago

I still think this is an issue with the Project's application that doesn't resume after being suspend, and it starts work again only after complete restart of the application that is what basically happens when BOINC client is being restarted.

homersimpsons commented 2 days ago

I think he is saying that he has set a game in the Exclusive Applications dialog to suspend the GPU when the game is running, but BONC does not resume a GPU task when the game is exited.

Yes, I usually suspend and resume the task manually. But maybe it is just the fact that it does not correctly restart after an exclusive application.

I still think this is an issue with the Project's application that doesn't resume after being suspend, and it starts work again only after complete restart of the application that is what basically happens when BOINC client is being restarted.

Maybe, I do not have any technical details, is there any log I could provide that could help here? For the record, the GPUGRID applications does not checkpoint and they will just restart there computation if they have been suspended.