BOINC / boinc

Open-source software for volunteer computing and grid computing.
https://boinc.berkeley.edu
GNU Lesser General Public License v3.0
2.01k stars 446 forks source link

BOINC gets work and then can't finish on time. #4257

Open Sandman192 opened 3 years ago

Sandman192 commented 3 years ago

Describe the bug Getting work and not be able to finish and to never have it get any work at all unless I suspend other projects. 2 or 3 BOINC updates before this I never had this issue. RUNNING 24/7. Steps To Reproduce

Expected behavior Not to get work that my computer can't finish on time. Plus, have it not get any work units from any project and says won't finish on time.

Screenshots If applicable, add screenshots to help explain your problem.

System Information

Additional context 3.8 GHz, 6 cors, 12 threads.

3/9/2021 3:57:05 AM | TN-Grid Platform | Task 181047_Hs_T154140-OXCT1_wu-277_1614546821865_1 is 3.11 days overdue; you may not get credit for it. Consider aborting it. 3/9/2021 3:57:05 AM | TN-Grid Platform | Task 181053_Hs_T002498-OPTN_wu-183_1614555068070_1 is 3.03 days overdue; you may not get credit for it. Consider aborting it. 3/9/2021 3:57:05 AM | TN-Grid Platform | Task 181055_Hs_T002502-OPTN_wu-232_1614558714182_0 is 2.99 days overdue; you may not get credit for it. Consider aborting it. 3/9/2021 3:57:05 AM | TN-Grid Platform | Task 181074_Hs_T194898-OGN_wu-176_1614588563451_1 is 2.66 days overdue; you may not get credit for it. Consider aborting it. 3/9/2021 3:57:05 AM | TN-Grid Platform | Task 181084_Hs_T116698-OBFC2A_wu-83_1614600576731_1 is 2.50 days overdue; you may not get credit for it. Consider aborting it. 3/9/2021 3:57:05 AM | TN-Grid Platform | Task 181162_Hs_T191705-NELF_wu-18_1614716635947_0 is 1.18 days overdue; you may not get credit for it. Consider aborting it. 3/9/2021 3:57:05 AM | TN-Grid Platform | Task 181166_Hs_T191709-NELF_wu-65_1614721341325_1 is 1.11 days overdue; you may not get credit for it. Consider aborting it. 3/9/2021 3:57:05 AM | TN-Grid Platform | Task 181235_Hs_T175748-MYO1G_wu-194_1614812880971_1 is 0.25 days overdue; you may not get credit for it. Consider aborting it. 3/9/2021 3:57:05 AM | TN-Grid Platform | Task 181242_Hs_T045070-MYO1E_wu-217_1614821392673_1 is 0.18 days overdue; you may not get credit for it. Consider aborting it. 3/9/2021 3:57:45 AM | SiDock@home | Tasks won't finish in time: BOINC runs 99.6% of the time; computation is enabled 100.0% of that 3/9/2021 3:57:45 AM | SiDock@home | Project requested delay of 7 seconds 3/9/2021 3:57:52 AM | Rosetta@home | Sending scheduler request: To fetch work. 3/9/2021 3:57:52 AM | Rosetta@home | Requesting new tasks for CPU 3/9/2021 3:57:53 AM | Rosetta@home | Scheduler request completed: got 0 new tasks 3/9/2021 3:57:53 AM | Rosetta@home | No tasks sent 3/9/2021 3:57:53 AM | Rosetta@home | Tasks won't finish in time: BOINC runs 99.6% of the time; computation is enabled 100.0% of that 3/9/2021 3:57:53 AM | Rosetta@home | Project requested delay of 31 seconds 3/9/2021 3:57:59 AM | GPUGRID | Sending scheduler request: To fetch work. 3/9/2021 4:44:11 AM | TN-Grid Platform | Aborting task 181258_Hs_T100042-MYH7B_wu-211_1614840590806_1; not started and deadline has passed 3/9/2021 4:44:11 AM | TN-Grid Platform | Aborting task 181259_Hs_T100043-MYH7B_wu-90_1614841440934_1; not started and deadline has passed 3/9/2021 9:20:21 AM | SiDock@home | Tasks won't finish in time: BOINC runs 99.6% of the time; computation is enabled 100.0% of that 3/9/2021 9:20:21 AM | SiDock@home | Project requested delay of 7 seconds 3/9/2021 10:43:19 AM | Rosetta@home | Tasks won't finish in time: BOINC runs 99.6% of the time; computation is enabled 100.0% of that 3/9/2021 10:43:19 AM | Rosetta@home | Project requested delay of 31 seconds 3/9/2021 2:23:04 PM | SiDock@home | Tasks won't finish in time: BOINC runs 99.6% of the time; computation is enabled 100.0% of that 3/9/2021 2:23:04 PM | SiDock@home | Project requested delay of 7 seconds

Sandman192 commented 3 years ago

Note, I'm only getting TN-Grid only work. As you can see SiDock@home and Rosetta@home are set to get work but I have no work ready to be run from those two. Well, until I tell it to not get work from TN-Grid. I check Si and Rosetta have work to send.

(WCG is off right now). If I added WCG to get tasks and that's all I'll get. No more work from TN-Grid, no work for SiDock, no work for Rosetta. Just WCG. Well, until I set to "No New Tasks" from it.

This is happing to both 2 of my computers. Again never had this problem 2 to 3 BOINC updates before.

RichardHaselgrove commented 3 years ago

@Sandman192 I'm interested in exploring this issue, and discovering whether it might be related to my issue #4117. But I need to ask you some more detailed questions about how your projects are set up, and I'd prefer not to distract from the development work here.

Please could you join us in https://boinc.berkeley.edu/forum_thread.php?id=14146, or send me a private message via the link you'll find there?

AenBleidd commented 3 years ago

@Sandman192, could you please enable these flags and send extended logs (perfectly after a day of work)? image

Sandman192 commented 3 years ago

Got it.

Sandman192 commented 3 years ago

This event is from BOINC Tools Event Log. Boinc Event Log.txt

RichardHaselgrove commented 3 years ago

Sandman has accepted my invitation to join us on the BOINC message board. We've also exchangedsome private messages, and I've done some freelance investigating on my own account.

Firstly, Sandman has told me that he's not using an app_config.xml file, so my concern about a potential link with #4117 can be discounted.

Secondly, I found his account on the TN-Grid website, and looked specifically at his i7-5930K CPU Windows host. I was surprised to see

Sandman TN-Grid That's an extraordinary ratio between elapsed and CPU time, so I drew his attention to it. Sandman told me that he had been running Folding@Home at the same time as BOINC, but had stopped doing so. After a few days, the timings dropped to Sandman TN-Grid no FaH That is far more reasonable: I am tentatively ascribing the remaining discrepancy to the use of hyperthreading, but that hasn't been confirmed. The CPU is described as having 12 processors on the TN-Grid website, but Intel describes it as having '6 cores, 12 threads'.

I looked further into the association with Folding@Home, and found by experiment that, while BOINC runs CPU science apps at a thread base priority of 1, Folding runs its computational core at a thread base priority of 4. (screenshots at https://boinc.berkeley.edu/forum_thread.php?id=13563&postid=103522#103522)

I think it would be helpful if the voluntary 'distributed science' community could agree on a common industry standard value for "Lowest possible priority", to reduce the chance of similar events happening in the future.

Sandman192 commented 3 years ago

I'm giving you another event log. As I started WCG up along with others. As you can see all I have is WCG in the log and the only CPU work running. Boinc Event Log 2.txt