Open huyhg opened 5 years ago
From my testing with the etcd-operator, I found that the main problem right now is the deployer. It doesn't wait for the operator to install its CRD, and therefore fails because the operator-installed resource types don't exist yet (but are referenced in the app manifest). If I install the CRD manually first, then deployment succeeds and GC works as expected when the application is deleted (the operator and everything it created is deleted). The CRD is the only thing which isn't GC'ed.
How is it that operators are currently supported (e.g. https://github.com/GoogleCloudPlatform/click-to-deploy/tree/master/k8s/spark-operator)?
Kubernetes GC doesn't work for non-namespaced objects, or objects outside of the owner's namespace. Thus, it's not possible to expect deletion of an Application
object to trigger GC of a CRD
.
The deployer installs the Spark operator deployment
, giving it a service account with proper permissions. This deployment manages the Spark CRD. The deployment would need some deletion hook to remove the CRD. Note that, deleting the CRD will also delete all installed Spark clusters, so it's safer to leave the CRD behind.
Note that this paradigm works for a very specific case of operator: the marketplace application is the operator itself, not the instances that the operator spins up.
Interestingly, Helm have a similar problem and don't appear to have fixed it yet for operators. They've added a "crd-install" hook for installing CRDs before resources that depend on them, but that doesn't help much for operators that install the CRD themselves.