cscetbon / casskop

This Kubernetes operator automates Cassandra operations such as deploying rack aware clusters, scaling up and down, configuring C* and its JVM, upgrading JVM and C*, backup/restores and many more...
https://cscetbon.github.io/casskop/
Apache License 2.0
13 stars 8 forks source link

Cassandra operator doesn't properly work if launched in cluster scope #42

Closed therapy-lf closed 2 years ago

therapy-lf commented 2 years ago

Bug Report

What did you do? If the operator is running with cluster scope it can't properly resolve pod hostnames in other namespaces.

What did you expect to see? Pod hostname should contain namespace(e.g. cassandra-<rack>.<cluster-name>.<namespace> instead of cassandra-<rack>.<cluster-name>).

What did you see instead? Under which circumstances? Pods are running but it seems the operator couldn't fetch information about the clusters/racks:

2022-04-11T19:43:59.156Z    INFO    controller_cassandracluster Reconciling CassandraCluster    {"Request.Namespace": "therapy", "Request.Name": "cassandra"}
[reconcile.go:774::github.com/Orange-OpenSource/casskop/controllers/cassandracluster.(*CassandraClusterReconciler).ListCassandraClusterPods()] Apr 11 19:43:59.157 [D] [cluster:cassandra] [dc-rack:dc06-sandbox] List available pods
[reconcile.go:726::github.com/Orange-OpenSource/casskop/controllers/cassandracluster.(*CassandraClusterReconciler).CheckPodsState()] Apr 11 19:43:59.157 [D] [cluster:cassandra] [err:<nil>] Get first available pod
[reconcile.go:736::github.com/Orange-OpenSource/casskop/controllers/cassandracluster.(*CassandraClusterReconciler).CheckPodsState()] Apr 11 19:43:59.158 [I] [cluster:cassandra] [err:<nil>] We will request : cassandra-dc06-sandbox-0.cassandra to catch hostIdMap
[node_operations.go:60::github.com/Orange-OpenSource/casskop/controllers/cassandracluster.NewJolokiaClient()] Apr 11 19:43:59.158 [D] [host:cassandra-dc06-sandbox-0.cassandra] [namespace:therapy] [port:8778] [secretRef:{}] Creating Jolokia connection
[reconcile.go:746::github.com/Orange-OpenSource/casskop/controllers/cassandracluster.(*CassandraClusterReconciler).CheckPodsState()] Apr 11 19:43:59.262 [E] [cluster:cassandra] [err:Cannot get host id map: HTTP Request Failed: Post "http://cassandra-dc06-sandbox-0.cassandra:8778/jolokia/": dial tcp: lookup cassandra-dc06-sandbox-0.cassandra on 10.11.0.10:53: no such host] Failed to call cassandra-dc06-sandbox-0.cassandra to get hostIdMap
[cassandracluster_controller.go:122::github.com/Orange-OpenSource/casskop/controllers/cassandracluster.(*CassandraClusterReconciler).Reconcile()] Apr 11 19:43:59.262 [E] [cluster:cassandra] CheckPodsState Error: Cannot get host id map: HTTP Request Failed: Post "http://cassandra-dc06-sandbox-0.cassandra:8778/jolokia/": dial tcp: lookup cassandra-dc06-sandbox-0.cassandra on 10.11.0.10:53: no such host
[reconcile.go:774::github.com/Orange-OpenSource/casskop/controllers/cassandracluster.(*CassandraClusterReconciler).ListCassandraClusterPods()] Apr 11 19:43:59.262 [D] [cluster:cassandra] [dc-rack:dc06-sandbox] List available pods
[node_operations.go:60::github.com/Orange-OpenSource/casskop/controllers/cassandracluster.NewJolokiaClient()] Apr 11 19:43:59.262 [D] [host:cassandra-dc06-sandbox-0.cassandra] [namespace:therapy] [port:8778] [secretRef:{}] Creating Jolokia connection
[reconcile.go:490::github.com/Orange-OpenSource/casskop/controllers/cassandracluster.(*CassandraClusterReconciler).ReconcileRack()] Apr 11 19:43:59.275 [E] [cluster:cassandra] [dc-rack:dc06-sandbox] [err:Cannot check if there are joining nodes: HTTP Request Failed: Post "http://cassandra-dc06-sandbox-0.cassandra:8778/jolokia/": dial tcp: lookup cassandra-dc06-sandbox-0.cassandra on 10.11.0.10:53: no such host] Executing pod operation failed
[reconcile.go:510::github.com/Orange-OpenSource/casskop/controllers/cassandracluster.(*CassandraClusterReconciler).ReconcileRack()] Apr 11 19:43:59.275 [W] [LastActionName:Initializing] [LastActionStatus:Done] [Phase:Running] [cluster:cassandra] [dc-rack:dc06-sandbox] Should Not see this message ;) Waiting Rack to be running before continuing, we loop on Next Rack, maybe we don't want that

Environment

Possible Solution Here in: https://github.com/cscetbon/casskop/blob/a66f86eeca832876281b40b5d3ed820b4599a361/pkg/k8s/util.go#L210-L212

pod.Spec.Subdomain will not include namespace suffix. Add an ability to handle cluster scope by adding namespace suffix to pod hostname in case it's running not in the operator's namespace.

Additional context We're trying to launch multiple cassandra clusters in their namespaces but want to avoid overhead on running operators in each namespace, moreover, they might conflict with each other.

cscetbon commented 2 years ago

Unfortunately we've designed it to run with one operator per namespace. You can propose an argument that would change the behavior in the case of an operator that would handle it at the cluster level but you would need to ensure that there is a kuttl test that validates the expected behavior

cscetbon commented 2 years ago

@therapy-lf did you see my comment ?

cscetbon commented 2 years ago

Closing this issue as I didn't get any feedback so far