Open arndt-netapp opened 3 months ago
Yes this will be ideal, a way to specify which backends to avoid so Trident starts working asap will really help the cause. When those unreachable backends subsequently come back online, we want Trident to be able to connect to them though.
Consider a single K8s cluster that is deployed across multiple physical datacenters with storage backends being served by unique NetApp storage clusters in each datacenter location. If one of the datacenters goes offline, Trident currently does not provide a way to ignore those offline backends and only proceed with the backends that are expected to be online. In the current implementation, the Trident attempts to connect to all backends when the controller is started. If there are many backends in the offline location, this causes a long delay before Trident is functional for the online locations.
The ask is to provide an option for the Trident controller to ignore the offline backends and quickly come up and allow management of the backends that are expected to be online. The "suspend backend" feature that was introduced in 23.10 only works for backends that are still online. If we could provide a similar option to temporarily ignore backends that are not online, during disaster scenarios, that would be ideal.