ansible-collections / azure

Development area for Azure Collections
https://galaxy.ansible.com/azure/azcollection
GNU General Public License v3.0
247 stars 331 forks source link

azure.azcollection.azure_rm_galleryimageversion module times out if gallery image version creation takes more than 10 minutes #1721

Closed sourav-citrix closed 3 weeks ago

sourav-citrix commented 2 months ago
SUMMARY

azure.azcollection.azure_rm_galleryimageversion module times out if gallery image version creation takes more than 10 minutes

From the code:

i = 0
        while response['properties']['provisioningState'] == 'Creating':
            time.sleep(60)
            response = self.get_resource()
            i = i + 1
            if i == 10:
                self.fail("Create or Updating encountered an exception, wait 10 minutes when the status is still 'creating'")

Ref Link: https://github.com/ansible-collections/azure/blob/dev/plugins/modules/azure_rm_galleryimageversion.py#L863

It looks like the module waits for 10mins to check if the image version was successfully created or not. However, I recently ran into an issue where the creation process roughly takes about 15 minutes, because of which I end up with the following error message:

fatal: [localhost]: FAILED! => {"changed": false, "msg": "Create or Updating encountered an exception, wait 10 minutes when the status is still 'creating'"}

It would be nice to have a timeout field that users can specify themselves than have a hardcoded value of 10minutes.

ISSUE TYPE
COMPONENT NAME

azure.azcollection.azure_rm_galleryimageversion

ANSIBLE VERSION
ansible [core 2.17.4]
  config file = None
  configured module search path = ['/home/ansibleuser/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /home/ansibleuser/.local/lib/python3.10/site-packages/ansible
  ansible collection location = /home/ansibleuser/.ansible/collections:/usr/share/ansible/collections
  executable location = /home/ansibleuser/.local/bin/ansible
  python version = 3.10.12 (main, Jul 29 2024, 16:56:48) [GCC 11.4.0] (/usr/bin/python3)
  jinja version = 3.0.3
  libyaml = True
COLLECTION VERSION
Collection         Version
------------------ -------
azure.azcollection 2.7.0
CONFIGURATION
OS / ENVIRONMENT
STEPS TO REPRODUCE
EXPECTED RESULTS

ok: [localhost]

ACTUAL RESULTS
fatal: [localhost]: FAILED! => {"changed": false, "msg": "Create or Updating encountered an exception, wait 10 minutes when the status is still 'creating'"}
ggeorge-pros commented 1 month ago

Also experiencing this issue for an image that simply takes longer than 10 minutes.

ggeorge-pros commented 1 month ago

Mitigation step is adding this to your task to retry yourself:

register: image_status
until: image_status.failed == false
retries: 10
delay: 60
Fred-sun commented 1 month ago

Mitigation step is adding this to your task to retry yourself:

register: image_status
until: image_status.failed == false
retries: 10
delay: 60

@sourav-citrix This is an effective approach, and one that is currently supported. Thanks!

Fred-sun commented 1 month ago

@sourav-citrix @ggeorge-pros However, not every user's use case will be written this way, and we have added delays to the module just to avoid errors caused by the incorrect state of the resource after it is created, like the problem with #1584. In addition, if the resource is created and the resource status is obtained, it will exit immediately and will not hold here. Thank you!

sourav-citrix commented 1 month ago

@ggeorge-pros Thanks! I am relatively new to Ansible but I did use a similar work around by adding an additional block to check the state of image creation. Even though I am ignoring the errors in the first block, the output still shows up in red which I would have liked to avoid. Adding the snippet below:

- name: Create gallery image version
    azure_rm_galleryimageversion:
      resource_group: "{{ rg_name }}"
      gallery_name: "{{ cg_name }}"
      gallery_image_name: "{{ img_name }}"
      name: "{{ img_version }}"
      location: "{{ rg_location }}"
      storage_profile:
        os_disk:
          source: "{{ snapshot.id }}"
    ignore_errors: true

  - name: Checking for the gallery image version state
    azure_rm_galleryimageversion_info:
      resource_group: "{{ rg_name }}"
      gallery_name: "{{ cg_name }}"
      gallery_image_name: "{{ img_name }}"
      name: "{{ img_version }}"
    register: img_state
    until: img_state.versions.provisioning_state == 'Failed' or img_state.versions.provisioning_state == 'Succeeded'
    retries: 30
    delay: 30

@Fred-sun Thanks for the response, however, I think a better way would be to simply add an optional parameter called retry_count with a default value of 10 and just plug it into the code. I think that should solve the problem:

      i = 0
      while response['properties']['provisioningState'] == 'Creating':
          time.sleep(60)
          response = self.get_resource()
          i = i + 1
          if i == retry_count:
              self.fail("Create or Updating encountered an exception, wait 10 minutes when the status is still 'creating'")
Fred-sun commented 1 month ago

@sourav-citrix This is just to avoid errors caused by resource creation timeouts. It is usually created quickly and the state is ready, without adding a parameter. Thank you!