OpenNebula / one

The open source Cloud & Edge Computing Platform bringing real freedom to your Enterprise Cloud 🚀
http://opennebula.io
Apache License 2.0
1.19k stars 472 forks source link

Deleting/recreating image fails (e.g. with terraform) - possible race condition #5359

Open Yenya opened 3 years ago

Yenya commented 3 years ago

Description When trying to create an image just after the image of the same name has been deleted (e.g. when using terraform), creating the image fails with "NAME is already take by IMAGE ". Waiting for a few seconds and retrying succeeds. So it seems the oneimage delete finishes and reports success prematurely.

To Reproduce

$ oneimage clone 1229 test-yenya
ID: 1290
(wait for the image to become READY)
$ oneimage delete 1290 ; until oneimage clone 1229 test-yenya; do sleep 1; done
[one.image.clone] Error allocating a new image. NAME is already taken by IMAGE 1290.
[one.image.clone] Error allocating a new image. NAME is already taken by IMAGE 1290.
ID: 1291

Expected behavior The image name should be available as soon as the image delete finishes.

Details

Additional context I discovered this when trying to use an OpenNebula Terraform provider. As soon as Terraform decides it needs to recreate the image, it does delete immediately followed by clone, which fails. Reported this as issue in OpenNebula/terraform-provider-opennebula: https://github.com/OpenNebula/terraform-provider-opennebula/issues/116 but the maintainer suggests that it is a problem with core OpenNebula.

Progress Status

paczerny commented 3 years ago

This is not a bug, this is a feature :-) On oneimage delete the oned evaluates if it's possible to delete the image: It's not used, not locked, ... If all checks passes, it set the Image state to Image::DELETE and sends asynchronous command to storage driver to delete the file, doesn't wait for execution and immediately returns success. The oned deletes the Image from DB, when it receives answer from the storage driver. If it fails to delete the file, the error is reported only to oned.log There is no option to force synchronous execution

Yenya commented 3 years ago

So, the correct way to solve this would be terraform polling for the image being deleted? Is OpenNebula API sufficiently rich for that?

paczerny commented 3 years ago

Something like this should work: oneimage delete test; while oneimage show test &>/dev/null; do sleep 1; done; oneimage clone 0 test