csi-addons / volume-replication-operator

Apache License 2.0
16 stars 23 forks source link

Handle error when deleting CR with Secondary Image #62

Closed sp98 closed 3 years ago

sp98 commented 3 years ago
  1. Change replicationState=Secondary
  2. Delete the CR
  3. Following error is observed:
    
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.7.0/pkg/internal/controller/controller.go:263
    sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.7.0/pkg/internal/controller/controller.go:235
    sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.7.0/pkg/internal/controller/controller.go:198
    k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1
        /go/pkg/mod/k8s.io/apimachinery@v0.19.2/pkg/util/wait/wait.go:185
    k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
        /go/pkg/mod/k8s.io/apimachinery@v0.19.2/pkg/util/wait/wait.go:155
    k8s.io/apimachinery/pkg/util/wait.BackoffUntil
        /go/pkg/mod/k8s.io/apimachinery@v0.19.2/pkg/util/wait/wait.go:156
    k8s.io/apimachinery/pkg/util/wait.JitterUntil
        /go/pkg/mod/k8s.io/apimachinery@v0.19.2/pkg/util/wait/wait.go:133
    k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext
        /go/pkg/mod/k8s.io/apimachinery@v0.19.2/pkg/util/wait/wait.go:185
    k8s.io/apimachinery/pkg/util/wait.UntilWithContext
        /go/pkg/mod/k8s.io/apimachinery@v0.19.2/pkg/util/wait/wait.go:99
    2021-03-30T08:10:25.552Z        ERROR   controller-runtime.manager.controller.volumereplication Reconciler error        {"reconciler group": "replication.storage.openshift.io", "reconciler kind": "VolumeReplication", "name": "volumereplication-sample", "namespace": "default", "error": "rpc error: code = InvalidArgument desc = image is in non-primary state"}
    github.com/go-logr/zapr.(*zapLogger).Error
        /go/pkg/mod/github.com/go-logr/zapr@v0.2.0/zapr.go:132
    sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.7.0/pkg/internal/controller/controller.go:267
    sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.7.0/pkg/internal/controller/controller.go:235
    sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.7.0/pkg/internal/controller/controller.go:198
    k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1
        /go/pkg/mod/k8s.io/apimachinery@v0.19.2/pkg/util/wait/wait.go:185
    k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
        /go/pkg/mod/k8s.io/apimachinery@v0.19.2/pkg/util/wait/wait.go:155
    k8s.io/apimachinery/pkg/util/wait.BackoffUntil
        /go/pkg/mod/k8s.io/apimachinery@v0.19.2/pkg/util/wait/wait.go:156
    k8s.io/apimachinery/pkg/util/wait.JitterUntil
        /go/pkg/mod/k8s.io/apimachinery@v0.19.2/pkg/util/wait/wait.go:133
    k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext
        /go/pkg/mod/k8s.io/apimachinery@v0.19.2/pkg/util/wait/wait.go:185
    k8s.io/apimachinery/pkg/util/wait.UntilWithContext
        /go/pkg/mod/k8s.io/apimachinery@v0.19.2/pkg/util/wait/wait.go:99
rexagod commented 3 years ago

Hey can you paste in the CR YAML for the first step? I can't seem to find replicationImage in any of the CRDs. Thanks!

sp98 commented 3 years ago

missed your message. Its been changed to replicationState recently.

Madhu-1 commented 3 years ago

I think we need to think about it. if the state is secondary and the user tried to delete the CR. we can just delete the CR without doing anything or let the CR deletion fails as storage is now allowing to disable replication.

sp98 commented 3 years ago

I would suggest continuing with the CR deletion (as thats what the user wanted to do). If there is error regarding the image being Secondary, we can ignore that.

ShyamsundarR commented 3 years ago

The garbage collection of the secondary image would be done when primary VR is deleted, as that would progress to disable and then when the PVC/PV is deleted, delete the required image.

Hence, on a VR deletion when secondary, we can just allow the deletion of the VR resource and not take any action against storage.

The one corner case would be to ensure we are actually secondary and not in a split-brain, as I am unsure if in that condition the image would be garbage collected when replication is disabled on the primary. If we could test this, we should be good to either leave the image as split-brain or to resync it and then delete the VR resource, leaving the image behind for the primary workflow to delete the same.

Madhu-1 commented 3 years ago

@sp98 is it still the case?

sp98 commented 3 years ago

We can close this.