Closed jimk-bdrc closed 1 month ago
grep -Re 'W1NLM\(45\|46\|52\|56\|57\)00' .
Shows that the DAG picked up each and every one of these SQS messages.
Each of these was the second or subsequent message picked up in get_restored_object_messages. But the downstream tasks only processed the first message, or in the case of W1NLM4500, croaked completely.
The fix for this involves logging the messages received into a pending queue, and then removing them when the queue was syncd. This could use the DipLog class.
I initiated glacier restore on a number of
glacier.staging.nlm.bdrc.org
works. (W1NLM4700-5000, 5100 - 5900) All the ones that existed successfully restored. Most of them sent SQS messages that thesqs_scheduled_dag
picked up and syncd A random subset (4500,4600, 5200, 5600, 5700) were successfully restored, but there was no message sent, so the dag didn't pick it up.find out why, and how to recover, develop another input path. (There's a dataset facility that could be a data bridge between dags)