Open ericpassmore opened 9 months ago
To support AWS spot instances we need to recover jobs that are orphaned. This is accomplished via a timeout check. Jobs status and times are updated every few minutes with most recent block. So every WORKING
job should have a recent update time.
Add cron job to do HTTP PUT /jobtimeoutcheck
that checks for timeout conditions and performs corrective action
Tagged with faster-replay
because spot instances should allow 2x more nodes.
Timeout check needs to