Azure / OpenShift

Azure Red Hat OpenShift
https://docs.microsoft.com/azure/openshift/intro-openshift
MIT License
114 stars 38 forks source link

Unable to create ODF/OCS storage cluster on ARO cluster v10 #286

Closed maulik-shah999 closed 1 year ago

maulik-shah999 commented 2 years ago

RedHat Case: https://access.redhat.com/support/cases/#/case/03267534/

What problem/issue/behavior are you having trouble with? What do you expect to see? We are trying to add ODF storage cluster to the ARO cluster following the documentation at https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.10/html/deploying_openshift_data_foundation_using_microsoft_azure_and_azure_red_hat_openshift/deploying-openshift-data-foundation-on-microsoft-azure_azure

But it fails while trying to mount the PVC.

The rook-ceph-mon pods fails to initialize.

rook-ceph-mon-a-5559cf8ccb-79tsr 0/2 Init:0/2 0 5h43m rook-ceph-mon-b-66b95854d-jc2j4 0/2 Init:0/2 0 5h32m rook-ceph-mon-c-5c5958bb6-4bg69 0/2 Init:0/2 0 5h20m

We see these errors in the pod logs

Warning FailedMount 3m22s (x2 over 17m) kubelet Unable to attach or mount volumes: unmounted volumes=[ceph-daemon-data], unattached volumes=[ceph-daemon-data kube-api-access-jr746 rook-config-override rook-ceph-mons-keyring rook-ceph-log rook-ceph-crash]: timed out waiting for the condition

Warning FailedAttachVolume 65s (x16 over 23m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-126648a8-85e0-40da-b1c9-f3fe44e89557" : rpc error: code = NotFound desc = Volume not found, failed with error: Retriable: false, RetryAfter: 0s, HTTPStatusCode: 404, RawError: azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions//resourceGroups/aro-ms-ocs12-aro450/providers/Microsoft.Compute/disks/ms-ocs12-aro450-t4xcz-dynamic-pvc-126648a8-85e0-40da-b1c9-f3fe44e89557?api-version=2021-04-01: StatusCode=404 -- Original Error: adal: Refresh request failed. Status Code = '404'. Response body: Endpoint https://login.microsoftonline.com/oauth2/token Warning FailedMount 64s (x6 over 19m) kubelet Unable to attach or mount volumes: unmounted volumes=[ceph-daemon-data], unattached volumes=[rook-ceph-crash ceph-daemon-data kube-api-access-jr746 rook-config-override rook-ceph-mons-keyring rook-ceph-log]: timed out waiting for the condition

What is the business impact? Please also provide timeframe information. We are not able to install Cloud Pak for Data on ARO

Where are you experiencing the behavior? What environment? Azure RedHat Openshift managed service

When does the behavior occur? Frequency? Repeatedly? At certain times? Always

maulik-shah999 commented 2 years ago

I discussed with the RedHat team and they need more log from the Microsoft team to debug this issue. I really appreciate any help you can provide. Thanks.

jboutaud commented 2 years ago

Hi @maulik-shah999
I recommend working through the support case for this issue, that will get you to resolution the quickest.

Thanks, Jerome

maulik-shah999 commented 2 years ago

@jboutaud Thanks for your response. I created a support case: 2208160010006487 on the Azure Portal. Can you please update the support team to look into this? Thanks

bartek-lopatka commented 1 year ago

Is there any progress/info about this issue? We've recently run into the same problem trying to make azure-files-csi to work on ARO 4.10.40

nastacio commented 1 year ago

For whatever it is worth, and not clear whether this is supported, I was able to create an ODF cluster inside ARO 4.10.54.

maulik-shah999 commented 1 year ago

Yes, it does support it. The RedHat team has resolved this issue on the ARO Openshift version 4.10.23 or later. So, you should be able to install ODF in any version after 4.10.23 for 4.10.x series. I don't see any issue in the 4.10.40 and 4.10.54 ARO versions so far.