Closed oldmantaiter closed 10 years ago
A checkmark can be used to keep track of what has been done. See pull #18.
Instead of using checkmarks, we added an exponential back-off mechanism to inferno to retry these operations. If all of the retries fail, inferno just gives up. Moreover, if there is a segmentation fault, OOM kill, or something else that does not let inferno retry, we still might execute the result processor more than once.
Currently, if the archive operation fails the job is re-run regardless of the result processor actions that have already run successfully and committed their data.
We need to track local state in the inferno master to track the actions that have succeeded/failed to ensure we don't re-run them in case of any failures in the result processor and job cleanup/final stages.