canonical / mysql-k8s-operator

A Charmed Operator for running MySQL on Kubernetes
https://charmhub.io/mysql-k8s
Apache License 2.0
8 stars 15 forks source link

DPE-2959 async replication #352

Closed paulomach closed 4 months ago

paulomach commented 8 months ago

Issue

Need to have async replication

Solution

Use async replication library (owned by vm charm). Implementation as described by this workflow How-to draft in the discourse page

Also implements DPE-3761

Integration test included

TODO:

Unit tests tests on a fup PR on VM only(lib owner)

taurus-forever commented 6 months ago

As discussed in private: it is not possible to re-join the cluster after recreate-cluster. The juju stuck on:

Unit    Workload  Agent  Address       Ports  Message
db2/0*  blocked   idle   10.1.204.228         User data found, aborting async replication setup
db2/1   waiting   idle   10.1.204.196         waiting replica cluster be configured
db2/2   waiting   idle   10.1.204.207         waiting replica cluster be configured

STR:

  1. https://discourse.charmhub.io/t/charmed-mysql-k8s-async-replication/12904
  2. After juju run -m az2 db2/leader recreate-cluster it is not possible to re-join db2 to az1.db1 because az2 has the previos data which is impossible to wipe.

Proposal: 1) the blocked cluster after juju remove-relation -m az2 async-primary db2 can be re-joined back using juju relate -m az2 async-primary db2 (the local az2 data will be wiped) 2) if the cluster re-create, it cannot re-join, it must be destroyed and re-deployed. Safety first.

paulomach commented 5 months ago

@taurus-forever

2. if the cluster re-create, it cannot re-join, it must be destroyed and re-deployed. Safety first.

Should we make this parametrized for the action?

taurus-forever commented 5 months ago

@taurus-forever

2. if the cluster re-create, it cannot re-join, it must be destroyed and re-deployed. Safety first.

Should we make this parametrized for the action?

IMHO, not now. It is a dangerous action. If you want to re-attach the re-created cluster: juju remove-application mysql && juju remove-storage ... && juju deploy mysql (c) Safety first.

P.S. we can implement such action later after the proper spec review by PMs.

taurus-forever commented 5 months ago

For the history, revision 128 from edge/arepl fails:

unit-db2-0: 14:13:57 DEBUG unit.db2/0.juju-log async-replica:3: Syncing credentials from primary cluster                   
unit-db2-0: 14:13:57 ERROR unit.db2/0.juju-log async-replica:3: Uncaught exception while in charm code:                    
Traceback (most recent call last):                                                                                         
  File "/var/lib/juju/agents/unit-db2-0/charm/venv/ops/model.py", line 3019, in _run                                       
    result = subprocess.run(args, **kwargs)  # type: ignore                                                                
  File "/usr/lib/python3.10/subprocess.py", line 526, in run                                                               
    raise CalledProcessError(retcode, process.args,                                                                        
subprocess.CalledProcessError: Command '('/var/lib/juju/tools/unit-db2-0/secret-get', 'secret:cnpfe3nmp25c77tiud60', '--format=json')' returned non-zero exit status 1.
paulomach commented 5 months ago

For the history, revision 128 from edge/arepl fails:

unit-db2-0: 14:13:57 DEBUG unit.db2/0.juju-log async-replica:3: Syncing credentials from primary cluster                   
unit-db2-0: 14:13:57 ERROR unit.db2/0.juju-log async-replica:3: Uncaught exception while in charm code:                    
Traceback (most recent call last):                                                                                         
  File "/var/lib/juju/agents/unit-db2-0/charm/venv/ops/model.py", line 3019, in _run                                       
    result = subprocess.run(args, **kwargs)  # type: ignore                                                                
  File "/usr/lib/python3.10/subprocess.py", line 526, in run                                                               
    raise CalledProcessError(retcode, process.args,                                                                        
subprocess.CalledProcessError: Command '('/var/lib/juju/tools/unit-db2-0/secret-get', 'secret:cnpfe3nmp25c77tiud60', '--format=json')' returned non-zero exit status 1.

Updated revision ~r129~ ~r130~ r134 with fixes