FoldingAtHome / fah-issues

49 stars 9 forks source link

In some cases the client does not quit retrying a failing WU. #1172

Open cxhernandez opened 7 years ago

cxhernandez commented 7 years ago

User ChristianVirtual writes:

I recently had a WU which didn't started folding (reported in FF); it happen while I was at work and not able to look after the slot. WU started and immediately stopped at 0% and restarted, 1353 times until I found it when comming home. It would be good for science if the FAHControl could detect those retrys and dump the WU itself after 5 or 10 attempts.

jcoffland commented 7 years ago

This needs more information. The client does try to detect retries abort and it usually does. However, there are some scenarios where part of the run succeeds so the client clears it's error count. Then the run fails after the error count was cleared and the client retries indefinitely. These cases are rare. I need to do a significant overhaul of the client error handling code to fix this in general. I need more information to fixed this specific case.