NationalSecurityAgency / timely

Accumulo backed time series database
https://code.nsa.gov/timely/
Apache License 2.0
378 stars 108 forks source link

Improve ageoff iterator remove/add logic in DataStoreImpl #217

Closed billoley closed 2 years ago

billoley commented 3 years ago

Removing and re-adding the metrics and meta ageoff iterators when the Timely server starts is a way to ensure that the configuration in timely.yml is applied to Accumulo. However, when multiple Timely servers get started at the same time the Accumulo table operations can interfere with one another causing a distributed race condition.

The previous way of handling this was to ignore conflicts since each server wa removing and data not applying the iterator (into the same Accumulo). A problem happens when the add operations conflict and are ignored and the last operation was a remove. This leads to major compactions being queued up and the data not being removed.

We should retry the remove and add operations a sufficient number of times to ensure that the final state is having the ageoff iterators and settings applied.

billoley commented 2 years ago

Instead of having every server remove and add the iterators and settings, we will use the same LeaderLatch pattern that the Balancer uses to elect a lead server and that lead server will remove and add the iterator settings.