Failure to apply a lot of changes in one pass

Code0x58 commented 5 years ago

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Affected Resource(s)

All during Create/Update/Delete operations

Terraform Configuration Files

pastebin link - 390 lines, 3 variables

Debug Output

5 error(s) occurred:

* artifactory_local_repository.maven_release: 1 error(s) occurred:

* artifactory_local_repository.maven_release: PUT https://code0x58test.jfrog.io/code0x58test/api/repositories/x-libs-release-local: 400 [{Status:400 Message:Could not merge and save new descriptor [org.jfrog.common.ExecutionFailed: Last retry failed: exceeded number of retries 5. Not trying again (Should update revision 252)]
}]
* artifactory_local_repository.maven_snapshot: 1 error(s) occurred:

* artifactory_local_repository.maven_snapshot: PUT https://code0x58test.jfrog.io/code0x58test/api/repositories/x-libs-snapshot-local: 400 [{Status:400 Message:Could not merge and save new descriptor [org.jfrog.common.ExecutionFailed: Last retry failed: exceeded number of retries 5. Not trying again (Should update revision 252)]
}]
* artifactory_local_repository.rpm: 1 error(s) occurred:

* artifactory_local_repository.rpm: GET https://code0x58test.jfrog.io/code0x58test/api/repositories/x-rpm-local: 400 [{Status:400 Message:Bad Request}]
* artifactory_remote_repository.npm: 1 error(s) occurred:

* artifactory_remote_repository.npm: PUT https://code0x58test.jfrog.io/code0x58test/api/repositories/x-npm-remote: 400 [{Status:400 Message:Could not merge and save new descriptor [org.jfrog.common.ExecutionFailed: Last retry failed: exceeded number of retries 5. Not trying again (Should update revision 252)]
}]
* artifactory_virtual_repository.pypi: 1 error(s) occurred:

* artifactory_virtual_repository.pypi: GET https://code0x58test.jfrog.io/code0x58test/api/repositories/x-pypi: 400 [{Status:400 Message:Bad Request}]

Expected Behavior

I'd expect it to succeed like when it was smaller, or as it does after a couple of applies.

Actual Behavior

Artefactory can't keep up. It looks like there's a race to save the config which is worked around with server side retries, but that doesn't work when too many changes occur at once.

Steps to Reproduce

terraform apply

Important Factoids

I suspect it's possible to do something like set MaxConnsPerHost to 1 on the transport of the HTTP client, that way an instance of the terraform provider shouldn't be introducing the races that it otherwise would.

Work arounds include:

running terraform apply --parallelism=1 which isn't super as other non-Artefactory providers will suffer
repeating terraform apply until the state converges (can lead to bad state)

There is a server side config option mentioned here that sets the number of retries, while not a solution, it should be a lead for reading up if needed.

Code0x58 commented 5 years ago

I tried a crude patch to limit MaxConnsPerHost, but it didn't fix it:

diff --git a/pkg/artifactory/provider.go b/pkg/artifactory/provider.go
index cb41084..6f10fcd 100644
--- a/pkg/artifactory/provider.go
+++ b/pkg/artifactory/provider.go
@@ -62,16 +62,21 @@ func providerConfigure(d *schema.ResourceData) (interface{}, error) {
        password := d.Get("password").(string)
        token := d.Get("token").(string)

+       t := http.DefaultTransport.(*http.Transport)
+       t.MaxConnsPerHost = 1
+
        var client *http.Client
        if username != "" && password != "" {
                tp := artifactory.BasicAuthTransport{
-                       Username: username,
-                       Password: password,
+                       Username:  username,
+                       Password:  password,
+                       Transport: http.DefaultTransport,
                }
                client = tp.Client()
        } else if token != "" {
                tp := &artifactory.TokenAuthTransport{
-                       Token: token,
+                       Token:     token,
+                       Transport: http.DefaultTransport,
                }
                client = tp.Client()
        } else {

dillon-giacoppo commented 5 years ago

Duplicate of #9. The suggested workaround is to set parallelism to 1.

JFrog also provided an alternative solution, they recommended increasing this property to artifactory.central.config.save.number.of.retries=20 in artifactory.system.properties. With this you can keep terraform multithreaded however we have noticed some infrequent errors with access related resources can occur when doing batch operations (such as users, groups, permissions). The ticket you linked is to fix these errors.

I have looked at client side throttling in the past, but I think the ideal solution would be retries with exponential backoff, this would have to be added to every resource. This is not a priority however since the issue is worked around easily.

kad-meedel commented 3 years ago

Look at https://www.jfrog.com/jira/browse/RTFACT-16638 I have stil an issue with that.

atlassian / terraform-provider-artifactory