datainfrahq / druid-operator

Apache Druid On Kubernetes
Other
101 stars 42 forks source link

Adding checks to fix rollingDeploy for historical tiers #167

Closed aruraghuwanshi closed 5 months ago

aruraghuwanshi commented 5 months ago

Adding checks to make sure previous historical tier has been successfully deployed, before moving onto the next one

Fixes #166 .

Description

In the current Druid Operator, when rollingDeploy is enabled, the expectation is that the nodes will restart one at a time in the pre-defined order. In the case where we have multiple tiers within historicals, that is equivalent to having multiple Stateful Sets of NodeType historical. The Operator then does not stop to check whether each historical tier Statefulset is deployed and ends up deploying all historical tiers one after the other without waiting for a full deployment of the previous StatefulSet.

This PR aims to solve this issue by introducing a check on all historical tiers present in that cluster, if rollingDeploy is enabled, before going ahead with the next tier's deployment. In the specific case when we have replicas of datasource distributed across multiple tiers, we do not want all the tiers going down simultaneously (with rollingDeploy enabled), which might result in none of the segments being available, and hence downtime. This PR would solve that.


This PR has:


Key changed/added files in this PR
AdheipSingh commented 5 months ago

How about adding a tier as a high level object in the CR itself and make a decision on tiers ? We had previous discussions on adding tier as a high level object.

aruraghuwanshi commented 5 months ago

How about adding a tier as a high level object in the CR itself and make a decision on tiers ? We had previous discussions on adding tier as a high level object.

Sure, Let me come up with a proposal accordingly, based on our discussion on DM, and we can proceed from there.

I'll also remove the historicalTierList as a global variable and make it local to the function, just for the sake of making the PR more race-condition proof.