aiven / terraform-provider-aiven

Aiven Terraform Provider
https://registry.terraform.io/providers/aiven/aiven/latest/docs
MIT License
126 stars 69 forks source link

fix(flink): poll application deployment deletion #1602

Closed byashimov closed 6 months ago

byashimov commented 6 months ago

About this change—what it does

Tests run fast and fail to delete Flink Application Deployment in transitional statuses. The state machine for this type of resource is quite complicated. Instead of trying to replicate the transition flow this PR proposes different approach: it calls deletion over and over until succeeds or exceeds context timeout.

byashimov commented 6 months ago

Please fix it using a different approach. The fix is quite simple, we need to wait until the resource reaches RUNNING state before attempting to delete it. So we need to add a wait in the Create method, and wait until it transitions from INITIALIZING into RUNNING. The same fix needs to be applied over here, where we need to include INITIALIZING as one of the possible options for the Pending state.

It might not reach running state. On my tests the resource stuck in RESTARTING. I strongly believe that implementing the state machine here is a bad idea.

Serpentiel commented 6 months ago

On my tests the resource stuck in RESTARTING.

Would you mind sharing how you were testing it? Because I think this might happen if there's some kind of a misconfiguration that took place in the resource.

byashimov commented 6 months ago

On my tests the resource stuck in RESTARTING.

Would you mind sharing how you were testing it? Because I think this might happen if there's some kind of a misconfiguration that took place in the resource.

Here you go https://github.com/aiven/terraform-provider-aiven/actions/runs/7865584033/job/21458721920#step:6:19

Serpentiel commented 6 months ago

Here you go https://github.com/aiven/terraform-provider-aiven/actions/runs/7865584033/job/21458721920#step:6:19

It says that the service was in the INITIALIZING state, I don't see any mentions of RESTARTING there. As I mentioned, a fix is to add INITIALIZING state over here, and this issue won't occur anymore: https://github.com/aiven/terraform-provider-aiven/blob/main/internal/schemautil/wait.go#L34.

Either way, let’s get the hotfix applied to the other Flink resources besides the application deployment, and let’s get it merged.

byashimov commented 6 months ago

Here you go https://github.com/aiven/terraform-provider-aiven/actions/runs/7865584033/job/21458721920#step:6:19

It says that the service was in the INITIALIZING state, I don't see any mentions of RESTARTING there. As I mentioned, a fix is to add INITIALIZING state over here, and this issue won't occur anymore: https://github.com/aiven/terraform-provider-aiven/blob/main/internal/schemautil/wait.go#L34.

Either way, let’s get the hotfix applied to the other Flink resources besides the application deployment, and let’s get it merged.

Please make sure you are checking application deployment status and not the service's.

Serpentiel commented 6 months ago

Here you go https://github.com/aiven/terraform-provider-aiven/actions/runs/7865584033/job/21458721920#step:6:19

It says that the service was in the INITIALIZING state, I don't see any mentions of RESTARTING there. As I mentioned, a fix is to add INITIALIZING state over here, and this issue won't occur anymore: https://github.com/aiven/terraform-provider-aiven/blob/main/internal/schemautil/wait.go#L34. Either way, let’s get the hotfix applied to the other Flink resources besides the application deployment, and let’s get it merged.

Please make sure you are checking application deployment status and not the service's.

When I click on your link I see this:

Error: error cancelling Flink Application Deployment: 4[22](https://github.com/aiven/terraform-provider-aiven/actions/runs/7865584033/job/21458721920#step:6:23): {"errors":[{"message":"Impossible transition from status INITIALIZING to status CANCELLING_REQUESTED for deployment 3f59043c-a2d5-4c21-b15a-26419d6b2f4e","status":422}],"message":"Impossible transition from status INITIALIZING to status CANCELLING_REQUESTED for deployment 3f59043c-a2d5-4c21-b15a-26419d6b2f4e"} -

Am I missing something?

byashimov commented 6 months ago

Here you go https://github.com/aiven/terraform-provider-aiven/actions/runs/7865584033/job/21458721920#step:6:19

It says that the service was in the INITIALIZING state, I don't see any mentions of RESTARTING there. As I mentioned, a fix is to add INITIALIZING state over here, and this issue won't occur anymore: https://github.com/aiven/terraform-provider-aiven/blob/main/internal/schemautil/wait.go#L34. Either way, let’s get the hotfix applied to the other Flink resources besides the application deployment, and let’s get it merged.

Please make sure you are checking application deployment status and not the service's.

When I click on your link I see this:

Error: error cancelling Flink Application Deployment: 4[22](https://github.com/aiven/terraform-provider-aiven/actions/runs/7865584033/job/21458721920#step:6:23): {"errors":[{"message":"Impossible transition from status INITIALIZING to status CANCELLING_REQUESTED for deployment 3f59043c-a2d5-4c21-b15a-26419d6b2f4e","status":422}],"message":"Impossible transition from status INITIALIZING to status CANCELLING_REQUESTED for deployment 3f59043c-a2d5-4c21-b15a-26419d6b2f4e"} -

Am I missing something?

Go check yourself. Running right now.

image
Serpentiel commented 6 months ago

That's a great find, @byashimov! Looks like the SQL in the test is invalid, and it causes Flink to crash. I will adjust this behavior, and will add a separate test for this scenario.