arangodb / kube-arangodb

ArangoDB Kubernetes Operator - Start ArangoDB on Kubernetes in 5min
https://arangodb.github.io/kube-arangodb/
Apache License 2.0
225 stars 70 forks source link

Simple Deployment Option (non-operator / CRD) #1673

Closed TJM closed 2 months ago

TJM commented 2 months ago

Is there a simple deployment method that does not use CRDs or operators? We are on a shared cluster and creating CRDs is not permitted. We can only create resources within our designated namespace. We really just want to deploy the statefulsets directly with a helm chart. I found this one, https://github.com/sarahhenkens/arangodb-helm-chart .. but it hasn't been updated for a couple years, so I am not sure it is viable. We manage our service state with terraform, and also don't want something modifying state outside of TF, especially finalizers that break the shutdown of review environments.

Thanks in advance, Tommy

ajanikow commented 2 months ago

Hello!

The Operator has namespaces mode, designed to close all RBAC inside a single namespace. It will require the creation of the CRDs, but only once (even by Cluster Admin, which should be possible in most ecosystems).

Otherwise, the only solution is STS, but it is not provided. It won't be scalable or use any of the Graceful Shutdown functionalities.

Best Regards, Adam.

TJM commented 2 months ago

What happens when the CRDs need to be updated? Since they are cluster wide, they affect all environments for all tenants. We can't upgrade/test changes in the dev env before stage, without having separate GKE clusters. We would rather have a simple, statefulSet based deployment. We could look at using horizontalPodAutoscaler, but we don't need it.

Honestly, the graceful shutdown is what alerted the operations team about this. It been costing us money since it won't let the pods terminate in a review environment, and has thus caused our GKE cluster to scale up to its maximum size. It needs to have some sort of timeout on the graceful shutdown, or it might need to skip if the cluster is not running properly.

ajanikow commented 2 months ago

Hello!

CRDs are fully backward compatible. Additionally, Operator is able to work without CRD schema enabled - so you do not have to upgrade CRDs (only add new CRDs or new versions with major updates).

Our Cloud solution is using Enterprise Operator to manage ArangoDB. In this case we handle many Deployments on a single Kubernetes Cluster (with namespace and network isolation). We do not roll a new Operator versions on all of them at same time - just in the batches - so we ensure CRD compatibility.

Did you observe any issues with GracefulShutdown? Operator is handling it with timeouts (15 min) and with corner cases support (like, when you remove Deployment it will skip graceful shutdown steps).

I think issue which you observe (if it is related to KubeArangoDB Operator) is case when you remove Operator before removing ArangoDeployment. In this situation Finalizers will prevent deletion of the pods (it is enforced on the K8S level, like when you remove your Cloud Storage Operator - for example AWS ebs CSI - before removing all Volumes). In this situation removal of finalizers will allow Kubernetes API to remove Pod.

Best Regards, Adam.

TJM commented 2 months ago

It is possible that they did not have proper "depends_on" statements between their resources, additionally they may not be properly configuring the operators to be locked into a namespace, which with 5+ operators on the cluster may be causing pain. The more I have looked, there may also not actually be a "proper" helm deployment, even with the operator, so, there's that too. I just wanted to check to see if there was a simple, helm based deployment that uses built-in kubernetes types. They need a "KISS" (keep it super simple) method :)