Somehow there's been datasets where there's been some issue, i.e. bad raw files that lead to crashed convert job, the user gets annoyed and removes the dataset to start over.
User presses "retire dataset" which would remove files from active storage.
This job is then blocked by crashed convert job
when trying to reactivate, it doesn't work
proper way out is to remove corrupt files and re-convert
but admin needs to remove the crashed job which blocks first
Maybe run 20 convert workers instead, which each take one file per job, then they'd report granularly which files error, somehow the user will have to be able to remove those files, and resolve the crashed jobs, OR do not crash the job but automatically remove it or OK it? But how to report error if job is green? Automatic corrupt file removal? Or message like admin messaging but to the user?
Somehow there's been datasets where there's been some issue, i.e. bad raw files that lead to crashed convert job, the user gets annoyed and removes the dataset to start over.
Maybe run 20 convert workers instead, which each take one file per job, then they'd report granularly which files error, somehow the user will have to be able to remove those files, and resolve the crashed jobs, OR do not crash the job but automatically remove it or OK it? But how to report error if job is green? Automatic corrupt file removal? Or message like admin messaging but to the user?