Closed nirs closed 1 week ago
Works for VR conditions other then Validated. We propagate the messages form the VR conditions:
VR:
status:
conditions:
- lastTransitionTime: "2024-11-05T14:32:43Z"
message: failed to promote volume
observedGeneration: 1
reason: FailedToPromote
status: "False"
type: Completed
- lastTransitionTime: "2024-11-05T14:32:43Z"
message: failed to enable volume replication
observedGeneration: 1
reason: Error
status: "True"
type: Degraded
- lastTransitionTime: "2024-11-05T14:32:43Z"
message: volume is not resyncing
observedGeneration: 1
reason: NotResyncing
status: "False"
type: Resyncing
- lastTransitionTime: "2024-11-05T14:32:43Z"
message: 'failed to meet prerequisite: rpc error: code = FailedPrecondition
desc = system is not in a state required for the operation''s execution: failed
to enable mirroring on image "replicapool/csi-vol-f4737b6e-eeff-4137-8248-301cf37a3368":
parent image "replicapool/csi-snap-e7c91292-a272-4278-9ee9-6be7a4c8bfe0" is
not enabled for mirroring'
observedGeneration: 1
reason: PrerequisiteNotMet
status: "False"
type: Validated
VRG:
protectedPVCs:
- accessModes:
- ReadWriteOnce
conditions:
- lastTransitionTime: "2024-11-05T14:32:43Z"
message: failed to promote volume
observedGeneration: 1
reason: Error
status: "False"
type: DataReady
- lastTransitionTime: "2024-11-05T14:32:44Z"
message: PV cluster data already protected for PVC restored-pvc
observedGeneration: 1
reason: Uploaded
status: "True"
type: ClusterDataProtected
- lastTransitionTime: "2024-11-05T14:32:44Z"
message: failed to promote volume
observedGeneration: 1
reason: Error
status: "False"
type: DataProtected
Missing change: when Validated condition is False, we want to set the DataReady condition and DataProtected using the error message from the Validated condition. Currently we use the Validated condition only for checking if the VR is finished and can be removed.
Propgartion to protected pvcs message works now for all VR conditions:
protectedPVCs:
- accessModes:
- ReadWriteOnce
conditions:
- lastTransitionTime: "2024-11-05T16:36:15Z"
message: 'failed to meet prerequisite: rpc error: code = FailedPrecondition
desc = system is not in a state required for the operation''s execution:
failed to enable mirroring on image "replicapool/csi-vol-348f65fd-c658-4764-b7e7-85c45974e97e":
parent image "replicapool/csi-snap-1ef6bed0-57e3-458f-8a99-413b823dde59"
is not enabled for mirroring'
observedGeneration: 1
reason: Error
status: "False"
type: DataReady
- lastTransitionTime: "2024-11-05T16:36:16Z"
message: PV cluster data already protected for PVC restored-pvc
observedGeneration: 1
reason: Uploaded
status: "True"
type: ClusterDataProtected
- lastTransitionTime: "2024-11-05T16:36:15Z"
message: 'failed to meet prerequisite: rpc error: code = FailedPrecondition
desc = system is not in a state required for the operation''s execution:
failed to enable mirroring on image "replicapool/csi-vol-348f65fd-c658-4764-b7e7-85c45974e97e":
parent image "replicapool/csi-snap-1ef6bed0-57e3-458f-8a99-413b823dde59"
is not enabled for mirroring'
observedGeneration: 1
reason: Error
status: "False"
type: DataProtected
csiProvisioner: rook-ceph.rbd.csi.ceph.com
labels:
appname: busybox
ramendr.openshift.io/owner-name: flatten-drpc
ramendr.openshift.io/owner-namespace-name: ramen-ops
name: restored-pvc
namespace: flatten
replicationID:
id: ""
resources:
requests:
storage: 1Gi
storageClassName: rook-ceph-block
storageID:
id: rook-ceph-dr1-1
But we have 25 failed unit tests, need to understand why they fail.
We don't propagate the protected pvcs conditions to the drpc, so on the hub this does not help to debug the issue.
Maybe we can add list or errors messages from protected pvcs to make it easier to debug.
status:
actionDuration: 23.105201755s
actionStartTime: "2024-11-05T17:02:08Z"
conditions:
- lastTransitionTime: "2024-11-05T17:02:01Z"
message: Initial deployment completed
observedGeneration: 1
reason: Deployed
status: "True"
type: Available
- lastTransitionTime: "2024-11-05T17:02:01Z"
message: Ready
observedGeneration: 1
reason: Success
status: "True"
type: PeerReady
- lastTransitionTime: "2024-11-05T17:02:02Z"
message: VolumeReplicationGroup (ramen-ops/flatten-drpc) on cluster dr1 is reporting
errors (All PVCs of the VolumeReplicationGroup are not ready) readying workload
data, retrying till DataReady condition is met
observedGeneration: 1
reason: Error
status: "False"
type: Protected
lastKubeObjectProtectionTime: "2024-11-05T17:02:04Z"
lastUpdateTime: "2024-11-05T17:02:31Z"
observedGeneration: 1
phase: Deployed
preferredDecision:
clusterName: dr1
clusterNamespace: dr1
progression: Completed
resourceConditions:
conditions:
- lastTransitionTime: "2024-11-05T17:02:02Z"
message: All PVCs of the VolumeReplicationGroup are not ready
observedGeneration: 1
reason: Error
status: "False"
type: DataReady
- lastTransitionTime: "2024-11-05T17:02:02Z"
message: All PVCs of the VolumeReplicationGroup are not ready
observedGeneration: 1
reason: Error
status: "False"
type: DataProtected
- lastTransitionTime: "2024-11-05T17:02:01Z"
message: Nothing to restore
observedGeneration: 1
reason: Restored
status: "True"
type: ClusterDataReady
- lastTransitionTime: "2024-11-05T17:02:04Z"
message: Cluster data of all PVs are protected. Kube objects protected. Kube
objects protected
observedGeneration: 1
reason: Uploaded
status: "True"
type: ClusterDataProtected
resourceMeta:
generation: 1
kind: VolumeReplicationGroup
name: flatten-drpc
namespace: ramen-ops
protectedpvcs:
- restored-pvc
resourceVersion: "15650"
@nirs I not very sure on how the final VRG will look like, can you please point out the comment from above?
@nirs I not very sure on how the final VRG will look like, can you please point out the comment from above?
This comment show the change in the vrg: https://github.com/RamenDR/ramen/pull/1639#issuecomment-2457352330
LGTM, expect currently we are just setting error message to dataProtect and dataReady, but based on various conditions from VR, it should be populated and show messages accordingly. @nirs If I am not wrong you are planning to bring this change later on as bug fix right?
LGTM, expect currently we are just setting error message to dataProtect and dataReady, but based on various conditions from VR, it should be populated and show messages accordingly. @nirs If I am not wrong you are planning to bring this change later on as bug fix right?
Setting the error message is the purpose of change. In normal condition we know the exact state so using the message from the VR is not very useful. We may simplify the code later to just use the message from the VR.
Another issue duplicating the content of DataReady and DataProtected conditions, which does not seems right, but this is not a new issue, and changing it is not in the scope of this change.
@nirs can we get one more approval on this and merge it?
@nirs can we get one more approval on this and merge it?
We don't need more approvals. We kept it to give time for more reviewers.
When a VR condition is not met, we set the protected PVC condition message using the error message returned from isVRConditionMet(). When using csi-addons > 0.10.0, we use now the message from the condition instead of the default message.
Since the Validated condition is not reported by older version of csi-addons, and we must wait until the Validated condition status is known when VRG is deleted, isVRConditionMet() returns now also the state of the condition, which can be:
When we validate the Validate condition we have these cases:
Condition is missing: continue to next condition.
Condition is met: continue to the next condition.
Condition not met and its status is False. This VR will never complete and it is safe to delete since replication will never start. If VRG is deleted, we return true since the VR reached the designed state. Otherwise we return false. In this case we updated the protected pvc condition with the message from the VR condition.
Condition is not met and is stale or unnown: we need to check again later. There is no point to check the completed condition since a VR cannot complete without validation.In this case we updated the protected pvc condition with the message generated by isVRConditionMet() for stale or unknown conditions.
Example protected pvc DataReady condition with propagated message when VR validation failed: