Ericsson / ecchronos

Ericsson distributed repair scheduler for Apache Cassandra
Apache License 2.0
29 stars 36 forks source link

Implement RepairGroup Class for Managing and Executing Repair Tasks #738

Closed VictorCavichioli closed 1 month ago

VictorCavichioli commented 1 month ago

Story Description:

The RepairGroup class is responsible for managing repair tasks in Cassandra clusters, particularly scheduling and executing repairs for specific tables. The goal is to define and implement this class ensuring that it integrates correctly with the components like RepairConfiguration, ReplicaRepairGroup, DistributedJmxProxyFactory, and others.

The current implementation requires managing repair policies, metrics, and tasks within a distributed environment. The new class should support the configuration and execution of repair tasks, ensuring they follow the policies and successfully complete unless stopped by specific conditions.

Acceptance Criteria:

The execute() method should attempt to run all the repair tasks in the group, considering repair policies, and return a boolean indicating success or failure.

Definition of Done:

  1. All required methods are implemented in the RepairGroup class.
  2. Unit tests are written to validate the functionality of execute(), shouldContinue(), and getRepairTasks().
  3. Integration tests ensure the RepairGroup interacts correctly with distributed systems, using DistributedJmxProxyFactory and ReplicaRepairGroup.
  4. Logging and error handling are in place, with logs indicating when repairs fail or are stopped by policies.

Notes:

The UUID nodeID is passed to ensure tasks are associated with the correct node in a distributed environment.

Related with #652 Depends of #737

VictorCavichioli commented 1 month ago

PR merged