Implements DPE-3661: extend backup feature to support large deployment scenarios.
Currently, there are 4x type of scenarios:
1) Small deployments: the cluster performs all the different types of node roles
2) Large deployments - orchestrator: the app is in charge of not only its own application units but also to coordinate the across the different clusters
3) Large deployments - failover orchestrator: very similar to (2), this app must also publish its information in the peer relation, although all the clusters will only listen to the active manager
4) Large deployments - data only: do not perform any management tasks and should receive any relevant information via peer relation
For backups, clusters of type (3) and (4) have a special behavior: they will receive the backup data via peer-cluster relation and should refuse: i. to execute backup-related actions; and ii. to execute the s3-relation events themselves. The latter avoids confusions, e.g. an user inadvertently relates the cluster to different s3-integrators.
The implementation of (1) and (2) are very similar.
Adds following fixes related to testing in general:
ContinuousWrites is updated to hold the right count of documents in writes_value internally
Adds an is_burst option to ContinuousWrites: a test may choose to send 100-burst docs vs. doc-by-doc - is_burst defaults to True
The ContinuousWrites terminates its process as part of stop, avoiding stranded process generating docs to ContinuousWrites.INDEX_NAME post a given test
The start_and_check_continuous_writes updated to assert_start_and_check_continuous_writes
This step assumes a s3-integrator charm has been successfully deployed and configured. The large deployments backup is set with a s3-integrator connected solely to the main charm. It must be set as follows:
juju integrate s3-integrator main
Wait until the cluster deployment set. Now, backup / restore actions can be executed by running them against the main orchestrator's leader:
juju run main/leader create-backup
In case of failover
In case a failover must be triggered, besides the process described for failover of large deployments, also move the current s3-integrator from one cluster to another:
# If the cluster still exists
juju remove-relation main s3-integrator
# Then, connect with the new cluster manager
juju relate s3-integrator failover
Implementation Details
For developers, there is no meaningful difference between small and large deployments.
They both use the same backup_factory() to return the correct object for their case.
The large deployments expands the original concept of OpenSearchBackup to include other
juju applications that are not cluster_manager. This means a cluster may be a data-only or
even a failover cluster-manager and still interacts with s3-integrator at a certain level.
The baseline is that every unit in the cluster must import the S3 credentials. The main
orchestrator will share these credentials via the peer-cluster relation. Failover and data
clusters will import that information from the peer-cluster relation.
To implement the points above without causing too much disruption to the existing code,
a factory pattern has been adopted, where the main charm receives a OpenSearchBackupBase
object that corresponds to its own case (cluster-manager, failover, data, etc).
"""
Implements DPE-3661: extend backup feature to support large deployment scenarios.
Currently, there are 4x type of scenarios: 1) Small deployments: the cluster performs all the different types of node roles 2) Large deployments - orchestrator: the app is in charge of not only its own application units but also to coordinate the across the different clusters 3) Large deployments - failover orchestrator: very similar to (2), this app must also publish its information in the peer relation, although all the clusters will only listen to the active manager 4) Large deployments - data only: do not perform any management tasks and should receive any relevant information via peer relation
For backups, clusters of type (3) and (4) have a special behavior: they will receive the backup data via peer-cluster relation and should refuse: i. to execute backup-related actions; and ii. to execute the s3-relation events themselves. The latter avoids confusions, e.g. an user inadvertently relates the cluster to different s3-integrators.
The implementation of (1) and (2) are very similar.
It contains the same fix as: https://github.com/canonical/opensearch-operator/pull/253
Adds following fixes related to testing in general:
ContinuousWrites
is updated to hold the right count of documents inwrites_value
internallyis_burst
option toContinuousWrites
: a test may choose to send 100-burst docs vs. doc-by-doc -is_burst
defaults toTrue
ContinuousWrites
terminates its process as part ofstop
, avoiding stranded process generating docs toContinuousWrites.INDEX_NAME
post a given teststart_and_check_continuous_writes
updated toassert_start_and_check_continuous_writes
How To
Setup a large scale deployment
Deploy a large scale environment with:
Connect with S3
This step assumes a s3-integrator charm has been successfully deployed and configured. The large deployments backup is set with a
s3-integrator
connected solely to the main charm. It must be set as follows:Wait until the cluster deployment set. Now, backup / restore actions can be executed by running them against the main orchestrator's leader:
In case of failover
In case a failover must be triggered, besides the process described for failover of large deployments, also move the current s3-integrator from one cluster to another:
Implementation Details
For developers, there is no meaningful difference between small and large deployments. They both use the same backup_factory() to return the correct object for their case.
The large deployments expands the original concept of OpenSearchBackup to include other juju applications that are not cluster_manager. This means a cluster may be a data-only or even a failover cluster-manager and still interacts with s3-integrator at a certain level.
The baseline is that every unit in the cluster must import the S3 credentials. The main orchestrator will share these credentials via the peer-cluster relation. Failover and data clusters will import that information from the peer-cluster relation.
To implement the points above without causing too much disruption to the existing code, a factory pattern has been adopted, where the main charm receives a OpenSearchBackupBase object that corresponds to its own case (cluster-manager, failover, data, etc). """