NetApp / trident

Storage orchestrator for containers
Apache License 2.0
762 stars 222 forks source link

Restore backend state automatically after deconnexion #801

Closed Fedibnp closed 10 months ago

Fedibnp commented 1 year ago

Describe the solution you'd like

In order to be able to manage Netapp Trident persistent volumes, the Trident backend should be online.

If for some reason we lose the connection between the Trident controller and the NetApp SVM even for a few seconds, the backend will go to a failed state and will never go back online by itself, even if the connection is restored.

To resume an online state for the backend, we need to evacuate or recreate the controller replica set in order to refresh the configuration.

We like to have some sort of scheduled job that runs every x seconds to check if the SVM is reachable. and if that's the case, the backend should automatically go back online; if not, it remains failed.

mravi-na commented 10 months ago

This issue is fixed with https://github.com/NetApp/trident/commit/458db05e6475ab095351848a849d4a5a8de3f937 and included in 23.07 release.