Closed reneradoi closed 2 months ago
Hey @Mehdi-Bendriss thank you for the feedback! This is very valuable, as the current design is different than your expections given below.
Currently, when no data node is there, the cluster-manager
does get started, but without running the post_start_init
. That means it is there, but not initialized. The start hook will get deferred until the first data node is up and initialized (or endlessly, if that doesn't happen).
When a data node joins, it will currently start independently, initialize the security index and on that start, join the cluster (by contacting the already started cluster-manager). On the next re-emitted start hook, the cluster-manager will then fully come up.
I've chosen this design because I found the current code has a lot of situations on startup where, if the node is not fully up, the start event gets deferred and the update to the large deployment relation didn't happen. That's why I wanted to let them start independently and let the data node find the cluster once up.
I will try to rework that according to your given specification below.
Thanks!
Thanks René. I have mainly questions about synchronisation of the start sequence of all apps / nodes.
Can you confirm the following flow when there is no data node in the cluster:
when no data node in the application:
- no opensearch starts
- the start hook gets deferred endlessly
when a data node joins - through large deployment relations:
- we become aware of it through the
fleet_apps
object- we then start the leader unit of the main orchestrator
- which then notifies the large deployment relations
- the leader unit of the data cluster starts too
- start flow of the rest of the fleet resumes as usual
Hey @juditnovak! Thanks for the comment. I agree, these are not just "minor tweaks" or code adjustments. Sometimes this can get lost when being deeply invested into a topic.
Could we provide at least a base description of the logic? I.e. documentation within the code as done before (CA rotation).
[...]
If at least we could hold on to the earlier objective, that each PR would add 2 unittest to the testsuite... (I volunteer to add them if no other way.)
I've added a comment explaining the workflow at where it starts, and also added a few unit tests to document the changed behaviour. Hope this is fine for you.
Nice work @reneradoi . I tested the workflow and it follows exactly what @Mehdi-Bendriss described. I have a couple of notes:
Nice work @reneradoi . I tested the workflow and it follows exactly what @Mehdi-Bendriss described. I have a couple of notes:
- Once the main cluster manager node is up and running we add a failover node. The failover node status is set to "waiting" with the requesting lock message. This is misleading as it is actually blocked waiting for data nodes to join and the main cluster manager to be initialized.
- When you deploy the data nodes they go in an active state/idle state with no message. This is also misleading as they are waiting to be integrated with the cluster manager. I think we should change the state and add a message clarifying what is happening.
Hey @skourta thank you for your review! It's good that you deployed it and especially watched the status and messages!
The currently expected status and messages are documented in the integration test here:
apps_full_statuses={
MAIN_APP: {"blocked": [PClusterNoDataNode]},
FAILOVER_APP: {"blocked": [PClusterNoRelation]},
DATA_APP: {"blocked": [PClusterNoRelation]},
},
units_full_statuses={
MAIN_APP: {"units": {"blocked": [PClusterNoDataNode]}},
FAILOVER_APP: {"units": {"active": []}},
DATA_APP: {"units": {"active": []}},
}
The blocking of data and failover applications are shown on the app status (this was discussed with @Mehdi-Bendriss earlier). This status persists until the applications are related to another with the peer-cluster-relation. Having the same message on unit status would be redundant, from my point of view.
Issue
Currently we always add the
data
role to a node if it iscluster-manager
. This is required because otherwise the security index could not be initialized directly after startup of the first node.Solution
This PR provides a solution for enabling "cluster-manager-only" nodes in large deployments. The workaround for adding the
data
role by default is removed.The solution is implemented according this workflow:
when no data node in the application: -> no node starts, set status to
blocked
-> the start hook gets deferred endlesslywhen a data node joins - through large deployment relations: -> we become aware of it through the fleet_apps object -> we then start the leader unit of the main-orchestrator -> which then notifies the large deployment relations -> the leader unit of the data cluster starts too and initializes the security index -> start flow of the rest of the fleet resumes as usual (this waits for the previously deferred
StartEvent
on the main-orchestrator to be re-emitted)For this, the data model of the
PeerClusterApp
is adjusted and the roles of the application are added, in order to be able to check if there is anydata
role in the entire cluster fleet (can be queried withClusterTopology.data_role_in_cluster_fleet_apps()
)