Open taojing2002 opened 2 years ago
@taojing2002 I discussed these failure scenarios with @jeanetteclark in detail for MetaDIG, and Jeanette came up with a robust strategy for MetaDIG. The core problem is that the rabbitmq is reliable only if task processing delays are less than the rabbitmq timeout value set, at which point tasks get dropped. Increasing the timeout a lot just causes inefficiencies in the queue processing. Let's discuss with Jeanette to be sure Metacat is robust when using rabbitmq.
If an index task message from RabbitMQ results a failed index process, the dataone indexer should act based on the exceptions. If the issue is temporary, the RabbitMQ should send the index message again. If the issue is not recoverable, we should send it to a dead queue.