gbif / pipelines

Pipelines for data processing (GBIF and LivingAtlases)
Apache License 2.0
40 stars 28 forks source link

Retries in clustering, when there is an HBase table outage #977

Closed timrobertson100 closed 8 months ago

timrobertson100 commented 8 months ago

The GBIF clustering generates an HBase table that is used to add an isInCluster flag on Elasticsearch. To allow a workflow to automate the HBase table replacement, we should implement a retry mechanism that allows a 1-minute outage of the HBase table in the pipelines.

We discussed and chose not to use a more complex solution of creating new tables and watching a ZK node (i.e. like the maps solution) just to keep the implementation simple, knowing this is a weekly or monthly process. We might revisit that decision later.

muttcg commented 8 months ago

Deployed to PROD