DataONEorg / dataone-indexer

DataONE Indexer subsystem
Apache License 2.0
0 stars 2 forks source link

Some resourcemap objects take a long time to index #75

Open taojing2002 opened 4 months ago

taojing2002 commented 4 months ago

We found some resource map objects took a long time to be reindexed and causes RabbitMQ timeout:

IndexWorker.indexOjbect with the thread id 130 - Though the index worker Completed the index task from the index queue with the identifier: resourceMap_urn:uuid:dccb21ff-4566-48f3-aeb6-7024248e8d3d , the index type: create, sending acknowledgement back to rabbitmq failed since channel is already closed due to channel error; protocol method: #method<channel.close>(reply-code=406, reply-text=PRECONDITION_FAILED - delivery acknowledgement on channel 1 timed out. Timeout value used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more, class-id=0, method-id=0). So rabbitmq may resend the message again [org.dataone.cn.indexer.IndexWorker:indexOjbect:418]
mbjones commented 4 months ago

@jeanetteclark can you comment on these timeouts and potential solutions?

jeanetteclark commented 4 months ago

@taojing2002 and I discussed this on Monday, and again at the backend meeting today. We don't think there is any downside to doing a pre-emptive ack of the rabbitmq message like metadig does, and jobs that fail will automatically get retried without any intervention. I think the only piece still to be figured out is preventing a job from being attempted too many times - there was mention of a dead letter queue today on the call.