NetApp / trident

Storage orchestrator for containers
Apache License 2.0
762 stars 222 forks source link

Trident 24.06.0 PODs CrashLoopBackoff when backend is not available during cluster boot #912

Closed dklian68 closed 2 weeks ago

dklian68 commented 4 months ago

On Trident 24.02.0, when any or all the configured backends are not available the service comes up during Openshift cluster boot.

Console logs:

[dlian@okd4-services ~]$ oc get po -n trident NAME READY STATUS RESTARTS AGE trident-controller-df94f8cbd-gr7zh 6/6 Running 3 (2m4s ago) 16m trident-node-linux-6tqgd 2/2 Running 18 (2m10s ago) 91d trident-node-linux-8x9ps 2/2 Running 23 (2m16s ago) 91d trident-node-linux-f6t42 2/2 Running 26 (2m33s ago) 91d trident-node-linux-pn6hd 2/2 Running 25 (97s ago) 91d trident-node-linux-rbmr7 2/2 Running 23 (2m18s ago) 91d trident-node-linux-sqmrj 2/2 Running 22 (2m2s ago) 91d trident-node-linux-vmspp 2/2 Running 25 (91s ago) 91d trident-node-linux-wnvj4 2/2 Running 22 (2m4s ago) 91d trident-node-linux-zfjnr 2/2 Running 25 (115s ago) 91d [dlian@okd4-services ~]$ tridentctl -n trident get backend +-------------------+----------------+--------------------------------------+--------+------------+---------+ | NAME | STORAGE DRIVER | UUID | STATE | USER-STATE | VOLUMES | +-------------------+----------------+--------------------------------------+--------+------------+---------+ | be-otsc4-ssd | ontap-nas | f0a74c00-28cd-4938-b01f-2ed66c3a3422 | failed | normal | 0 | | be-site3-ots-svm1 | ontap-nas | 17501492-bc77-44d5-b6f8-1cfa37a67951 | failed | normal | 0 | | be-site2-ots-svm1 | ontap-nas | 5a33f2ac-d673-4a76-ba2a-0c09eb55b127 | failed | normal | 0 | | be-site2-ots-svm2 | ontap-nas | 0e7208ae-2b31-4750-a5e4-e9aaabe5e151 | failed | normal | 0 | | be-site3-ots-svm2 | ontap-nas | 909c493d-144b-40e6-b190-b5ac6d9a7c78 | failed | normal | 0 | | be-site1-ots-svm1 | ontap-nas | 62883211-67e2-478e-b99e-89832f768a91 | failed | normal | 0 | | be-site1-ots-svm2 | ontap-nas | 11ca4bd4-95a6-49eb-812b-9a6f8bf1dca4 | failed | normal | 0 | +-------------------+----------------+--------------------------------------+--------+------------+---------+ [dlian@okd4-services ~]$ tridentctl -n trident version +----------------+----------------+ | SERVER VERSION | CLIENT VERSION | +----------------+----------------+ | 24.02.0 | 24.02.0 | +----------------+----------------+ [dlian@okd4-services ~]$ oc version Client Version: 4.13.0-0.okd-2023-06-04-080300 Kustomize Version: v4.5.7 Server Version: 4.13.0-0.okd-2023-08-18-135805 Kubernetes Version: v1.26.4-2927+0ef5eae6ff8657-dirty

However, for Trident 24.06.0 all the PODs cycle between "Running" and "CrashLoopBackOff".

[dlian@okd415-services ~]$ oc get po -n trident NAME READY STATUS RESTARTS AGE trident-controller-675d45f4fb-qnfv7 6/6 Running 25 (4m52s ago) 23m trident-node-linux-26jm5 1/2 CrashLoopBackOff 12 (4m11s ago) 74m trident-node-linux-6f978 1/2 CrashLoopBackOff 11 (4m49s ago) 74m trident-node-linux-7csfj 2/2 Running 9 (5m29s ago) 74m trident-node-linux-gjxbx 2/2 Running 9 (5m9s ago) 74m trident-node-linux-rlpq7 1/2 CrashLoopBackOff 12 (4m15s ago) 74m trident-node-linux-v6dsr 1/2 CrashLoopBackOff 12 (4m39s ago) 74m trident-node-linux-vsjzg 1/2 CrashLoopBackOff 10 (4m29s ago) 74m trident-node-linux-wddph 2/2 Running 9 (5m28s ago) 74m trident-node-linux-xt27b 1/2 CrashLoopBackOff 11 (4m57s ago) 74m [dlian@okd415-services ~]$ tridentctl -n trident get backend +--------------+----------------+--------------------------------------+--------+------------+---------+ | NAME | STORAGE DRIVER | UUID | STATE | USER-STATE | VOLUMES | +--------------+----------------+--------------------------------------+--------+------------+---------+ | be-otsc4-ssd | ontap-nas | 41d5a938-fd79-4d7e-8287-4ceaf7bb5e8e | failed | normal | 7 | +--------------+----------------+--------------------------------------+--------+------------+---------+ [dlian@okd415-services ~]$ tridentctl -n trident version +----------------+----------------+ | SERVER VERSION | CLIENT VERSION | +----------------+----------------+ | 24.06.0 | 24.06.0 | +----------------+----------------+ [dlian@okd415-services ~]$ oc version Client Version: 4.15.0-0.okd-2024-02-10-035534 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: 4.15.0-0.okd-2024-02-10-035534 Kubernetes Version: v1.28.2-3572+f1618d54a81f9f-dirty

dklian68 commented 2 weeks ago

Tested with latest 24.10 - problem appears to be resolved.

[dlian@okd4-services ~]$ tridentctl -n trident version +----------------+----------------+ | SERVER VERSION | CLIENT VERSION | +----------------+----------------+ | 24.10.0 | 24.10.0 | +----------------+----------------+ [dlian@okd4-services ~]$ oc get po -n trident NAME READY STATUS RESTARTS AGE trident-controller-7dfd886bb5-ct5k6 6/6 Running 0 8m40s trident-node-linux-6hvb2 2/2 Running 1 (7m17s ago) 8m40s trident-node-linux-6vsxl 2/2 Running 2 (6m48s ago) 8m40s trident-node-linux-b5fnq 2/2 Running 2 (6m46s ago) 8m39s trident-node-linux-btw5d 2/2 Running 1 (7m19s ago) 8m39s trident-node-linux-dp8tw 2/2 Running 1 (7m19s ago) 8m39s trident-node-linux-mw4wr 2/2 Running 1 (7m18s ago) 8m40s trident-node-linux-n6hm9 2/2 Running 2 (6m50s ago) 8m39s trident-node-linux-w4pwp 2/2 Running 1 (7m17s ago) 8m39s trident-node-linux-wvc9f 2/2 Running 1 (7m17s ago) 8m39s [dlian@okd4-services ~]$ tridentctl -n trident get backend +--------------+----------------+--------------------------------------+---------+------------+---------+ | NAME | STORAGE DRIVER | UUID | STATE | USER-STATE | VOLUMES | +--------------+----------------+--------------------------------------+---------+------------+---------+ | be-otsc4-ssd | ontap-nas | f0a74c00-28cd-4938-b01f-2ed66c3a3422 | offline | normal | 0 | +--------------+----------------+--------------------------------------+---------+------------+---------+

sjpeeris commented 2 weeks ago

Thanks @dklian68. Closing this issue