canonical / mysql-k8s-operator

A Charmed Operator for running MySQL on Kubernetes
https://charmhub.io/mysql-k8s
Apache License 2.0
8 stars 15 forks source link

Evicted secondary node fail to cluster again #407

Open gboutry opened 2 months ago

gboutry commented 2 months ago

Steps to reproduce

Expected behavior

When the pod is scheduled to another node, it should be able to join back the quorum.

Actual behavior

The MySQL unit fails to join back the quorum, complaining about An instance with label 'squirrel-mysql-0' is already part of this InnoDB cluster

Versions

Operating system: Ubuntu 22.04.4 LTS

Juju CLI: 3.4.2-genericlinux-amd64

Juju agent: 3.4.2

Charm revision: 127

microk8s: MicroK8s v1.28.7 revision 6532

Log output

2024-04-18T08:55:29.833Z [container-agent] verbose: 2024-04-18T08:55:29Z: Loading startup files...                                                                                                                                                       
2024-04-18T08:55:29.833Z [container-agent] verbose: 2024-04-18T08:55:29Z: Loading plugins...                                                                                                                                                             
2024-04-18T08:55:29.833Z [container-agent] verbose: 2024-04-18T08:55:29Z: Connecting to MySQL at: clusteradmin@squirrel-mysql-1.squirrel-mysql-endpoints.rodents.svc.cluster.local                                                                       
2024-04-18T08:55:29.833Z [container-agent] verbose: 2024-04-18T08:55:29Z: Shell.connect: tid=257: CONNECTED: squirrel-mysql-1.squirrel-mysql-endpoints.rodents.svc.cluster.local                                                                         
2024-04-18T08:55:29.833Z [container-agent] verbose: 2024-04-18T08:55:29Z: Connecting to MySQL at: mysql://clusteradmin@squirrel-mysql-1.squirrel-mysql-endpoints.rodents.svc.cluster.local:3306?connect-timeout=5000                                     
2024-04-18T08:55:29.833Z [container-agent] verbose: 2024-04-18T08:55:29Z: Dba.get_cluster: tid=258: CONNECTED: squirrel-mysql-1.squirrel-mysql-endpoints.rodents.svc.cluster.local:3306                                                                  
2024-04-18T08:55:29.833Z [container-agent] verbose: 2024-04-18T08:55:29Z: Connecting to MySQL at: mysql://clusteradmin@squirrel-mysql-1.squirrel-mysql-endpoints.rodents.svc.cluster.local:3306?connect-timeout=5000                                     
2024-04-18T08:55:29.833Z [container-agent] verbose: 2024-04-18T08:55:29Z: Dba.get_cluster: tid=259: CONNECTED: squirrel-mysql-1.squirrel-mysql-endpoints.rodents.svc.cluster.local:3306                                                                  
2024-04-18T08:55:29.833Z [container-agent] verbose: 2024-04-18T08:55:29Z: Group Replication 'group_name' value: 673502ca-fd60-11ee-931d-dec0be28ec20                                                                                                     
2024-04-18T08:55:29.833Z [container-agent] verbose: 2024-04-18T08:55:29Z: Metadata 'group_name' value: 673502ca-fd60-11ee-931d-dec0be28ec20                                                                                                              
2024-04-18T08:55:29.833Z [container-agent] verbose: 2024-04-18T08:55:29Z: Connecting to MySQL at: mysql://clusteradmin@squirrel-mysql-1.squirrel-mysql-endpoints.rodents.svc.cluster.local:3306?connect-timeout=5000                                     
2024-04-18T08:55:29.833Z [container-agent] verbose: 2024-04-18T08:55:29Z: Dba.get_cluster: tid=260: CONNECTED: squirrel-mysql-1.squirrel-mysql-endpoints.rodents.svc.cluster.local:3306                                                                  
2024-04-18T08:55:29.833Z [container-agent] verbose: 2024-04-18T08:55:29Z: Connecting to MySQL at: mysql://clusteradmin@squirrel-mysql-1.squirrel-mysql-endpoints.rodents.svc.cluster.local:3306?connect-timeout=5000                                     
2024-04-18T08:55:29.833Z [container-agent] verbose: 2024-04-18T08:55:29Z: Dba.get_cluster: tid=261: CONNECTED: squirrel-mysql-1.squirrel-mysql-endpoints.rodents.svc.cluster.local:3306                                                                  
2024-04-18T08:55:29.833Z [container-agent] Traceback (most recent call last):                                                                                                                                                                            
2024-04-18T08:55:29.833Z [container-agent]   File "<string>", line 4, in <module>                                                                                                                                                                        
2024-04-18T08:55:29.833Z [container-agent] ValueError: Cluster.add_instance: An instance with label 'squirrel-mysql-0' is already part of this InnoDB cluster

Additional context

This is part of a work in Sunbeam to be able to remove a node being part of a k8s cluster. Mysql units might be hosted on there, and we should be able to re-schedule them to other nodes.

The reproduction script is a bit convoluted, but this actually resembles a sunbeam installation.

github-actions[bot] commented 2 months ago

https://warthogs.atlassian.net/browse/DPE-4118

shayancanonical commented 2 weeks ago

Update: I was able to reproduce the issue and test a potential fix. I will be honing this fix in the coming week and opening a PR with tests