env0 / terraform-provider-env0

Terraform Provider for env0
https://env0.com
Mozilla Public License 2.0
39 stars 14 forks source link

Fix: do not retry for 4xx errors #950

Closed TomerHeber closed 1 month ago

TomerHeber commented 1 month ago

Issue & Steps to Reproduce / Feature Request

fixes #949

Solution

Removed the default error handler.

TomerHeber commented 1 month ago

Can we add a test for this please?

@RLRabinowitz

I have checked... right now there are no tests for the provider http client. Adding these is an effort, but I'm not not sure we would gain anything from it.

There are these tests: client/http/client_test.go - but they don't really test the provider client... (which is created in providor.go).

I guess the question is: what are we trying to test?

we could test this specific function, by passing responses with different status codes:

            AddRetryCondition(func(r *resty.Response, err error) bool {
                if r == nil {
                    // No response. Possibly a networking issue (E.g. DNS lookup failure).
                    tflog.SubsystemWarn(subCtx, "env0_api_client", "No response, retrying request")
                    return true
                }

                // When running integration tests 404 may occur due to "database eventual consistency".
                // Retry when there's a 5xx error. Otherwise do not retry.
                if r.StatusCode() >= 500 || (isIntegrationTest && r.StatusCode() == 404) {
                    tflog.SubsystemWarn(subCtx, "env0_api_client", "Received a failed or not found response, retrying request", map[string]interface{}{"method": r.Request.Method, "url": r.Request.URL, "status code": r.StatusCode()})
                    return true
                }

                if r.StatusCode() == 200 && isIntegrationTest && r.String() == "[]" {
                    tflog.SubsystemWarn(subCtx, "env0_api_client", "Received an empty list , retrying request", map[string]interface{}{"method": r.Request.Method, "url": r.Request.URL})
                    return true
                }

                return false
            })

move this code to a dedicated, function... then write unit tests that each test case passes a different response object to this function.

Let me know if it's something you would like to me to do.

avnerenv0 commented 1 month ago

Didn't we add this to make the integration tests more stable? Like we had issues with weak consistency and we needed the retries for them to pass?

TomerHeber commented 1 month ago

Didn't we add this to make the integration tests more stable? Like we had issues with weak consistency and we needed the retries for them to pass?

@avnerenv0

We did... The question is: if you expect customers to run into the same eventual consistency issues? If the answer is 'yes', I may want to make some additional (minor) modifications.

TomerHeber commented 1 month ago

Didn't we add this to make the integration tests more stable? Like we had issues with weak consistency and we needed the retries for them to pass?

@avnerenv0

We did... The question is: if you expect customers to run into the same eventual consistency issues? If the answer is 'yes', I may want to make some additional (minor) modifications.

I removed the line that would retry for any error... and kept the one that retries only on specific use-cases. But if 404 'false-positive' is something customers may encounter as well... I would like to expend it not just for integration tests.

avnerenv0 commented 1 month ago

no that's ok. The integration tests sometimes create a resource and try to read it in the same file, that's not a pattern we need to support in real life.

@RLRabinowitz I'm good with this PR but I saw you also commented, so please approve if you are fine with the response.

RLRabinowitz commented 1 month ago

I guess the question is: what are we trying to test?

Well, we want to make sure that we are not retrying any 4xx HTTP errors, right? Make sure the scenario mentioned in the issue will not happen again. In this specific example - if a deploy fails with 4xx, we shouldn't retry it

Adding a UT on the retry condition doesn't really help, as the fix was unrelated to that code.

Ideally we'd have a test that actually makes sure that this doesn't happen again, as right now we have no guardrails saving us from this happening again

Couldn't such a test simply be added to client_test.go?

TomerHeber commented 1 month ago

I guess the question is: what are we trying to test?

Well, we want to make sure that we are not retrying any 4xx HTTP errors, right? Make sure the scenario mentioned in the issue will not happen again. In this specific example - if a deploy fails with 4xx, we shouldn't retry it

Adding a UT on the retry condition doesn't really help, as the fix was unrelated to that code.

Ideally we'd have a test that actually makes sure that this doesn't happen again, as right now we have no guardrails saving us from this happening again

Couldn't such a test simply be added to client_test.go?

Ok sure. I'll have to modify client_test.go since it tests an empty resty instance and not the one used by the provider. This will require some minor refactoring but nothing too big. (I hope).

TomerHeber commented 1 month ago

I guess the question is: what are we trying to test?

Well, we want to make sure that we are not retrying any 4xx HTTP errors, right? Make sure the scenario mentioned in the issue will not happen again. In this specific example - if a deploy fails with 4xx, we shouldn't retry it Adding a UT on the retry condition doesn't really help, as the fix was unrelated to that code. Ideally we'd have a test that actually makes sure that this doesn't happen again, as right now we have no guardrails saving us from this happening again Couldn't such a test simply be added to client_test.go?

Ok sure. I'll have to modify client_test.go since it tests an empty resty instance and not the one used by the provider. This will require some minor refactoring but nothing too big. (I hope).

@RLRabinowitz I have added unit tests as requested.