RamenDR / ramen

Apache License 2.0
73 stars 54 forks source link

Ensure required CRDs are part of the cluster before starting the reconciler #271

Open ShyamsundarR opened 3 years ago

ShyamsundarR commented 3 years ago

For example dr-cluster should ensure VolumeReplication CRD and VolumeReplicationClass CRD and the rest are present as APIs in the cluster before starting the container to avoid CLBO errors like so:

2021-09-13T21:11:41.610Z    ERROR   controller-runtime.source   source/source.go:128    if kind is a CRD, it should be installed before calling Start   {"kind": "VolumeReplication.replication.storage.openshift.io", "error": "no matches for kind \"VolumeReplication\" in version \"replication.storage.openshift.io/v1alpha1\""}
sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1
    /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.0-beta.4/pkg/source/source.go:128
2021-09-13T21:11:41.711Z    ERROR   controller-runtime.manager.controller.volumereplicationgroup    controller/controller.go:195    Could not wait for Cache to sync    {"reconciler group": "ramendr.openshift.io", "reconciler kind": "VolumeReplicationGroup", "error": "failed to wait for volumereplicationgroup caches to sync: no matches for kind \"VolumeReplication\" in version \"replication.storage.openshift.io/v1alpha1\""}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
    /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.0-beta.4/pkg/internal/controller/controller.go:195
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start
    /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.0-beta.4/pkg/internal/controller/controller.go:221
sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).startRunnable.func1
    /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.0-beta.4/pkg/manager/internal.go:697
2021-09-13T21:11:41.711Z    ERROR   controller-runtime.manager  runtime/asm_amd64.s:1371    error received after stop sequence was engaged  {"error": "leader election lost"}
2021-09-13T21:11:41.711Z    ERROR   setup   app/main.go:170 problem running manager {"error": "failed to wait for volumereplicationgroup caches to sync: no matches for kind \"VolumeReplication\" in version \"replication.storage.openshift.io/v1alpha1\""}
main.main
    /remote-source/app/main.go:170
runtime.main
    /usr/lib/golang/src/runtime/proc.go:225

Fix can be based on: https://github.com/csi-addons/volume-replication-operator/pull/108

ShyamsundarR commented 1 year ago

Also, from a bundling perspective, it is possible to prevent install if we define nativeAPIs that we need to function correctly. This could be an intermediate step to ensure that the environment is as needed for the bundle to be installed.

nirs commented 1 year ago

It is not clear what Ensure` means for this issue - fail if the crds do not exist or install them if they do not exist?

What is the expected behavior of the system when the crds are missing?

ShyamsundarR commented 1 year ago

It is not clear what Ensure` means for this issue - fail if the crds do not exist or install them if they do not exist?

Initially the thought as per the issue description was to perform a runtime check on dependent CRDs and fail/exit the reconciler/controller if found missing.

We should not attempt to install the CRDs at present, at least that is not the expectation.

What is the expected behavior of the system when the crds are missing?

Later as CSV can call out dependent APIs, the thought is to add the dependent CRDs here, which will prevent the install of Ramen itself on a cluster if dependent CRDs are missing.