Open lsat12357 opened 5 years ago
Add other services as needed.
because of intermittent system errors, this time around :
a number of assets were ingested in an incomplete state. Fixing them required some combination of:
May want to draw the line somewhere and just delete/reingest? we just had some changes to infra, likely will not have this level of error when we start migrating for real.
I've been finding works that, according to the aasm_status, failed during the persist_work stage, but are actually in Hyrax. Presumably something goes wrong after the object is saved but before the stack completes. I think the causes I've seen so far (that aren't just problems we need to fix) have to do with the attach files job. I think we could have a job/service that retries running the attach files and checks visibility and if that operation is successful, a callback that updates the migrator work status. We could add more services to cover other failures as they become apparent.