ansible-collections / netapp.ontap

Ansible collection to support NetApp ONTAP configuration.
https://galaxy.ansible.com/netapp/ontap
GNU General Public License v3.0
57 stars 36 forks source link

Timeout value in netapp.ontap.na_ontap_software_update not used #81

Closed david-sieg closed 2 years ago

david-sieg commented 2 years ago

Summary

For updating our NetApps I set extra value for timeout because of having 6 nodes. But this value isn't be used...

Component Name

netapp.ontap.na_ontap_software_update

Ansible Version

2.9.27

ONTAP Collection Version

21.19.1

ONTAP Version

NetApp Release 9.10.1P4

Playbook

---
- hosts: all
  gather_facts: no

  tasks:

    - name: "ONTAP software update on {{ inventory_hostname }}"
      netapp.ontap.na_ontap_software_update:
        state: present
        nodes: "{{ pr_netapp_nodes }}"
        package_url: "{{ pr_netapp_soft_url }}"
        package_version: "{{ pr_netapp_ver_name }}"
        ignore_validation_warning: true
        download_only: "{{ pr_netapp_download_only }}"
        hostname: "{{ pr_netapp_host }}"
        username: "{{ pr_netapp_admin_usr }}"
        password: "{{ pr_netapp_admin_pwd }}"
        stabilize_minutes: "{{ pr_netapp_time_to_wait|default(omit) }}"
        timeout: "{{ pr_netapp_timeout }}"
        https: "{{ pr_netapp_https }}"
        validate_certs: "{{ pr_netapp_validate_ssl }}"
     delegate_to: localhost
     when: pr_netapp_update|bool

Steps to Reproduce

Expected Results

Running nice update

Actual Results

{
    "msg": "Timeout error updating image: state: in_progress.  Should the timeout value be increased?  Current value is 7200 seconds.  The software update continues in background.",
    "validation_reports_after_download": [
        "only available if validate_after_download is true"
    ],
    "validation_reports_after_update": [
        {
            "update_check": "Manual checks",
            "status": "warning",
            "issue": {
                "message": "Manual validation checks need to be performed. Refer to the Upgrade Advisor Plan or \"Performing manual checks before an automated cluster upgrade\" section in the \"Clustered Data ONTAP Upgrade Express Guide\" for the remaining validation checks that need to be performed before update. Failing to do so can result in an update failure or an I/O disruption."
            },
            "action": {
                "message": "Refer to the Upgrade Advisor Plan or \"Performing manual checks before an automated cluster upgrade\" section in the \"Clustered Data ONTAP Upgrade Express Guide\" for the remaining validation checks that need to be performed before update."
            }
        },
        {
            "update_check": "NFS mounts",
            "status": "warning",
            "issue": {
                "message": "This cluster is serving NFS clients. If NFS soft mounts are used, there is a possibility of frequent NFS timeouts and race conditions that can lead to data corruption during the upgrade."
            },
            "action": {
                "message": "Use NFS hard mounts, if possbile."
            }
        },
        {
            "update_check": "SAN compatibility",
            "status": "warning",
            "issue": {
                "message": "Since this cluster is configured for SAN, manually confirm that the SAN configuration is fully supported."
            },
            "action": {
                "message": "All SAN components-including target Data ONTAP software version, host OS and patches, required Host Utilities software, and adapter drivers and firmware-should be supported."
            }
        }
    ],
    "invocation": {
        "module_args": {
            "state": "present",
            "nodes": [
                "node1",
                "node2"
            ],
            "package_url": "https://repo.domain.local/repository/netapp_repo/ontap/9101P4_q_image.tgz",
            "package_version": "9.10.1P4",
            "ignore_validation_warning": true,
            "download_only": false,
            "hostname": "ontap.domain.local",
            "username": "adminuser",
            "password": "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER",
            "stabilize_minutes": 10,
            "timeout": 7200,
            "https": true,
            "validate_certs": false,
            "use_rest": "auto",
            "feature_flags": {},
            "force_update": false,
            "validate_after_download": false,
            "http_port": null,
            "ontapi": null,
            "cert_filepath": null,
            "key_filepath": null
        }
    },
    "_ansible_no_log": false,
    "changed": false,
    "_ansible_delegated_vars": {
        "ansible_host": "localhost",
        "ansible_port": null,
        "ansible_user": "root",
        "ansible_connection": "local"
    }
}
david-sieg commented 2 years ago

Task failed happens after ~30 minutes.

lonico commented 2 years ago

We thought ONTAP REST API would wait, according to the timeout value. We need to validate this. We have a second loop that is only set or 5 minutes.

DEVOPS-5241

lonico commented 2 years ago

Though the reported version is 21.19.1 for the collection, the error message Timeout error updating image: state is specific to REST, so the collection must be slightly more recent.

We verified that the module correctly loops on getting the update job status, so it's not clear at this point why there would be a timeout after 30 minutes or so. We will enforce an additional check with a proper timer.

~~If you feel like it, you collect a trace to help us better understand what is going on under the cover: https://github.com/ansible-collections/netapp.ontap/wiki/Debugging#tracing-zapi-and-rest-api-calls~~

lonico commented 2 years ago

I observed a difference of behavior between ONTAP 9.8 and 9.9. With 9.8, I see what is described above, PATCH returns a long lived job, though it may not survive the reboot. With 9.9, PATCH returns a short lived job, and we need to poll the update status.

lonico commented 2 years ago

fixed in 21.22.0