Currently, if an alignment coordinator job fails due to a spot interruption the next job reruns all of the alignment chunks. It is likely that chunks will be complete, especially since the jobs are not cancelled and will continue working between retries. This can prevent a lot of redundant work and keep our alignment job queue smaller which should slightly mitigate other issues.
Currently, if an alignment coordinator job fails due to a spot interruption the next job reruns all of the alignment chunks. It is likely that chunks will be complete, especially since the jobs are not cancelled and will continue working between retries. This can prevent a lot of redundant work and keep our alignment job queue smaller which should slightly mitigate other issues.