buda-base / ao-workflows

Use DAG platform to define and orchestrate workflows
0 stars 0 forks source link

Missing SQS notifications #19

Closed jimk-bdrc closed 1 month ago

jimk-bdrc commented 2 months ago

I initiated glacier restore on a number of glacier.staging.nlm.bdrc.org works. (W1NLM4700-5000, 5100 - 5900) All the ones that existed successfully restored. Most of them sent SQS messages that the sqs_scheduled_dag picked up and syncd A random subset (4500,4600, 5200, 5600, 5700) were successfully restored, but there was no message sent, so the dag didn't pick it up.

find out why, and how to recover, develop another input path. (There's a dataset facility that could be a data bridge between dags)

jimk-bdrc commented 2 months ago

grep -Re 'W1NLM\(45\|46\|52\|56\|57\)00' . Shows that the DAG picked up each and every one of these SQS messages.

Each of these was the second or subsequent message picked up in get_restored_object_messages. But the downstream tasks only processed the first message, or in the case of W1NLM4500, croaked completely.

jimk-bdrc commented 2 months ago

The fix for this involves logging the messages received into a pending queue, and then removing them when the queue was syncd. This could use the DipLog class.