NetAppDocs / ontap-systems-switches

https://docs.netapp.com/us-en/ontap-systems-switches/
3 stars 8 forks source link

Request to add steps to for EFOS 3.4.4.6 to 3.7.x.x and higher versions to avoid ISL issue/outage #159

Closed ArronKlem closed 1 month ago

ArronKlem commented 4 months ago

Page URL

https://docs.netapp.com/us-en/ontap-systems-switches/switch-bes-53248/configure-efos-software.html

Page title

Install the EFOS software

Summary

The documentation for upgrading the EFOS does not include any steps to migrate the cluster traffic away from the switch being upgraded. At the same time, it notes that on upgrade of 3.4.4.6 to higher 3.7.x.x versions the ISLs will go down, and links the KB: https://kb.netapp.com/onprem/Switches/Broadcom/BES-53248_ISL_down_when_upgrading_to_EFOS_3_7_0_4_or_later

As expected, and as the KB indicates, if the cluster traffic is not isolated to a single switch, and the ISLs are down, the cluster will have a data outage as the cluster will no longer be in quorum.

With auto-revert turned on, any customer who upgrades from 3.4.4.6 to a higher version will have their ISLs go down on the first switch reboot, and the cluster LIFs will move back to the ports that connect to the upgraded switch while the ISLs are down - creating an outage.

The EFOS upgrade page even notes that doing the 3.4.4.6 to 3.7.x.x and higher upgrade needs to follow the basic "Method 1: Install EFOS" steps. Those steps have the note the ISLs will go down, and links the KB above. It does not anywhere state that the traffic needs to be moved to avoid an outage. Instead, the documentation seems to walk the customer into an outage situation, and then provide a KB to resolve that outage.

Is our expectation for customer to automatically know to migrate the cluster traffic (even though it is not stated), or do we expect them to have the outage, then follow the KB? It seems we need to add some additional steps for the EFOS upgrade to avoid this issue when upgrading from 3.4.4.6 to higher EFOS version.

Public issues must not contain sensitive information

netapp-pcarriga commented 4 months ago

Hi @ArronKlem, thank you for bringing this to our attention. We'll investigate and, if required, make the necessary documentation updates.

netapp-maireadn commented 4 months ago

in progress

netapp-maireadn commented 3 months ago

Still in progress. Working with SME's to add in the required new content

netapp-maireadn commented 2 months ago

in review

netapp-maireadn commented 1 month ago

@ArronKlem I'm closing this GitHub issue now. Documentation has been updated. The install and upgrade EFOS procedures have been separated into 2 topics. New steps added to upgrade procedure on disabling and enabling auto-revert on the cluster LIFs. Install topic streamlined, simplified and improved links to the updated topics: https://docs.netapp.com/us-en/ontap-systems-switches/switch-bes-53248/upgrade-efos-software.html#prepare-for-upgrade https://docs.netapp.com/us-en/ontap-systems-switches/switch-bes-53248/configure-efos-software.html