c4-project / c4t

Runs concurrent C compiler tests
MIT License
1 stars 0 forks source link

Use exponential backoff for erroring machines #83

Open MattWindsor91 opened 4 years ago

MattWindsor91 commented 4 years ago

Having just had an experiment trashed by a transient error (no disk space!) triggering the current error handling routine (if N errors happen in a row, kill the tester), I'm proposing a change of error policy to something like this:

If we ever decouple machines and runners (per #71), or need to start/stop machines to perform updates (per #75) then the backoff could happen at a higher level, effectively being a timer for when the machine will next be considered for dispatch to a runner.