Endless loop in snapshots::Client::create_or_update::into_future()

andlr commented 8 months ago

Probably this applies to some other LROs as well. Example code:

        let snapshot = self
            .client
            .snapshots_client()
            .create_or_update(
                self.subscription_id.clone(),
                resource_group_name,
                snapshot_name,
                Snapshot {
                    resource: Resource::new(location),
                    managed_by: None,
                    sku: Some(sku),
                    extended_location: None,
                    properties: Some(SnapshotProperties::new(CreationData {
                        create_option: CreateOption::Copy,
                        storage_account_id: None,
                        image_reference: None,
                        gallery_image_reference: None,
                        source_uri: None,
                        source_resource_id: Some(source_resource_id),
                        source_unique_id: None,
                        upload_size_bytes: None,
                        logical_sector_size: None,
                        security_data_uri: None,
                        performance_plus: None,
                        elastic_san_resource_id: None,
                    })),
                },
            )
            .await?;

Instead of making one request to create a snapshot and checking provisioning state in a loop, it keeps sending endless requests to create a snapshot in a loop. After each request, it immediately gets provisioning state Running, and on the next iteration it sends a new snapshot request (generated code from azure_mgmt_compute):

            fn into_future(self) -> Self::IntoFuture {
                Box::pin(async move {
                    use azure_core::{
                        error::{Error, ErrorKind},
                        lro::{body_content::get_provisioning_state, get_retry_after, LroStatus},
                        sleep::sleep,
                    };
                    use std::time::Duration;
                    loop {
                        let this = self.clone();
                        let response = this.send().await?;
                        let retry_after = get_retry_after(response.as_raw_response().headers());
                        let status = response.as_raw_response().status();
                        let body = response.into_body().await?;
                        let provisioning_state = get_provisioning_state(status, &body)?;
                        log::trace!("current provisioning_state: {provisioning_state:?}");
                        match provisioning_state {
                            LroStatus::Succeeded => return Ok(body),
                            LroStatus::Failed => return Err(Error::message(ErrorKind::Other, "Long running operation failed".to_string())),
                            LroStatus::Canceled => {
                                return Err(Error::message(ErrorKind::Other, "Long running operation canceled".to_string()))
                            }
                            _ => {
                                sleep(retry_after).await;
                            }
                        }
                    }
                })
            }

As a result, I got a lot of snapshots (overwritten with the same name), until I cancelled it:

I.e. for some reason this operation uses original URL instead of azure-asyncoperation

andlr commented 8 months ago

Hm, I guess it's because x-ms-long-running-operation-options isn't specified in the spec and default is original-uri, so it's probably not an issue of the SDK. I've looked through dotnet client SDK, and it seems that LRO related code there doesn't take into account spec file, it's implemented in common code and not generated separately for each API method. It looks through headers in response to decide which polling URL to use.

I may be wrong, but it seems that the problem is that x-ms-long-running-operation-options should be specified in the spec, but it isn't. And in .NET SDK it works because LRO code isn't generated and handles all possible scenarios depending on the response.

I've manually replaced into_future implementation in generated code to the implementation which uses AzureAsyncOperation and it works as expected.

cataggar commented 8 months ago

If x-ms-long-running-operation-options is not specified in the spec, we should handle all possible scenarios like the other SDKs. I think x-ms-long-running-operation-options will allow optimization.

Azure / azure-sdk-for-rust

Endless loop in snapshots::Client::create_or_update::into_future() #1525