canonical / opensearch-operator

OpenSearch operator
Apache License 2.0
9 stars 5 forks source link

[Large Deployments] `_rel_err_data` will always return `should_sever_relation = True` for main orchestrator #256

Closed phvalguima closed 2 months ago

phvalguima commented 2 months ago

The following check in this line is:

         elif orchestrators.failover_app and orchestrators.failover_app != self.charm.app.name:  # <<<-----------
             should_sever_relation = True
             blocked_msg = (
                 "Cannot have 2 'failover'-orchestrators. Relate to the existing failover."
             )
         elif not self.charm.is_admin_user_configured():

The check: orchestrators.failover_app != self.charm.app.name will always be true. I believe that line should instead be:

         ... and orchestrators.failover_app == self.charm.app.name:

Where we check if the the orchestrators' failover application is the application in this current cluster. That safeguards from a cluster that used to be main and got recently demoted.

Juju status

Model              Controller           Cloud/Region         Version  SLA          Timestamp
test-backups-5a8g  localhost-localhost  localhost/localhost  3.4.2    unsupported  17:34:04+02:00

App                       Version  Status   Scale  Charm                     Channel        Rev  Exposed  Message
data-hot                           blocked      2  opensearch                                 1  no       Cannot have 2 'failover'-orchestrators. Relate to the existing failover.
failover                           blocked      1  opensearch                                 2  no       Cannot have 2 'failover'-orchestrators. Relate to the existing failover.
main                               active       2  opensearch                                 0  no       
s3-integrator                      active       1  s3-integrator             latest/edge     17  no       
self-signed-certificates           active       1  self-signed-certificates  latest/stable   72  no       

Unit                         Workload  Agent      Machine  Public address  Ports  Message
data-hot/0*                  active    idle       3        10.41.46.161           
data-hot/1                   active    idle       6        10.41.46.204           
failover/0*                  active    idle       2        10.41.46.242           
main/0                       active    idle       4        10.41.46.197           
main/1*                      active    executing  5        10.41.46.145           
s3-integrator/0*             active    idle       1        10.41.46.58            
self-signed-certificates/0*  active    idle       0        10.41.46.3             

Machine  State    Address       Inst id        Base          AZ  Message
0        started  10.41.46.3    juju-6d82d4-0  ubuntu@22.04      Running
1        started  10.41.46.58   juju-6d82d4-1  ubuntu@22.04      Running
2        started  10.41.46.242  juju-6d82d4-2  ubuntu@22.04      Running
3        started  10.41.46.161  juju-6d82d4-3  ubuntu@22.04      Running
4        started  10.41.46.197  juju-6d82d4-4  ubuntu@22.04      Running
5        started  10.41.46.145  juju-6d82d4-5  ubuntu@22.04      Running
6        started  10.41.46.204  juju-6d82d4-6  ubuntu@22.04      Running

Integration provider                   Requirer                           Interface            Type     Message
data-hot:node-lock-fallback            data-hot:node-lock-fallback        node_lock_fallback   peer     
data-hot:opensearch-peers              data-hot:opensearch-peers          opensearch_peers     peer     
failover:node-lock-fallback            failover:node-lock-fallback        node_lock_fallback   peer     
failover:opensearch-peers              failover:opensearch-peers          opensearch_peers     peer     
failover:peer-cluster-orchestrator     data-hot:peer-cluster              peer_cluster         regular  
main:node-lock-fallback                main:node-lock-fallback            node_lock_fallback   peer     
main:opensearch-peers                  main:opensearch-peers              opensearch_peers     peer     
main:peer-cluster-orchestrator         data-hot:peer-cluster              peer_cluster         regular  
main:peer-cluster-orchestrator         failover:peer-cluster              peer_cluster         regular  
s3-integrator:s3-integrator-peers      s3-integrator:s3-integrator-peers  s3-integrator-peers  peer     
self-signed-certificates:certificates  data-hot:certificates              tls-certificates     regular  
self-signed-certificates:certificates  failover:certificates              tls-certificates     regular  
self-signed-certificates:certificates  main:certificates                  tls-certificates     regular  

Steps to reproduce

Deploy as follows:

juju deploy tls-certificates-operator --channel stable --show-log --verbose
juju config tls-certificates-operator generate-self-signed-certificates=true ca-common-name="CN_CA"

# deploy main-orchestrator cluster 
juju deploy -n 3 ./opensearch.charm \
    main \
    --config cluster_name="log-app" --config init_hold=false --config roles="cluster_manager"

# deploy failover-orchestrator cluster
juju deploy -n 2 ./opensearch.charm \
    failover \
    --config cluster_name="log-app" --config init_hold=true --config roles="cluster_manager"

# deploy data-hot cluster
juju deploy -n 2 ./opensearch.charm \
    data-hot \
    --config cluster_name="log-app" --config init_hold=true --config roles="data.hot"

# integrate TLS
juju integrate tls-certificates-operator main
juju integrate tls-certificates-operator failover
juju integrate tls-certificates-operator data-hot

# integrate the "main"-orchestrator with all clusters:
juju integrate main:peer-cluster-orchestrator failover:peer-cluster
juju integrate main:peer-cluster-orchestrator data-hot:peer-cluster
juju integrate failover:peer-cluster-orchestrator data-hot:peer-cluster

Expected behavior

Should render an all-green deployment.

Actual behavior

Non main orchestrators are stuck in "blocked" on app level

github-actions[bot] commented 2 months ago

https://warthogs.atlassian.net/browse/DPE-4206