cloudamqp / terraform-provider-cloudamqp

Terraform Provider for CloudAMQP
https://registry.terraform.io/providers/cloudamqp/cloudamqp
Mozilla Public License 2.0
35 stars 39 forks source link

Update to sdk v2 #261

Closed tete17 closed 5 months ago

tete17 commented 8 months ago

Most of the heavy lifting is done by the tool and I just had to change the signature of some lambdas to include the newly added context.

Som deprecations have been added as a result of this but we can be dealt with it other MR.

This drops support for terraform <= 0.11.

My testing capabilities are very limited as my company owns an cloudamqp account but I personally don't. I have checked through the update documentation and nothing pops up that we should communicate to users other than the old terraform version support.

Closes #140

dentarg commented 8 months ago

Nice! What is the tool you used?

This will be an excellent test for https://github.com/cloudamqp/terraform-provider-cloudamqp/pull/257 once finished

tete17 commented 8 months ago

I used tf-skd-migrator v2upgrade.

It is in the recommended steps

tete17 commented 7 months ago

Hey @dentarg is the plan of #257 to make use of the terraform provided testing harness?

This update includes a testing ground we may want to look at to avoid creating a testing system only to migrate to the "official" one

tbroden84 commented 7 months ago

@tete17 we are using Terraforms Acceptance test together with go-vcr. Note that each test require some tweaking. With Go VCR we can record requests/responses to the API backend into fixtures, then do playbacks of previous tests. This significantly speed up the test times to ms instead of minutes for each test.

tbroden84 commented 6 months ago

Note about merged https://github.com/cloudamqp/terraform-provider-cloudamqp/pull/257 and our changes to how testing is done.

From the conflict list, the biggest impact should be the test files.

Minor impact

tbroden84 commented 6 months ago

Have a rebased version of this branch locally. However some issues with the new tests. Now with SDK v2, terraform-json dependency require a newer version. Found some strange mismatches in the recorded fixtures. Each test also have a huge increase in running time. Will continue to investigate this.

tete17 commented 6 months ago

Hey @tbroden84 give me a few hours I am on it :)

tete17 commented 6 months ago

The branch should be updated

@tbroden84 I am unable to run the test on my own but happy to step in and help with the fixtures if you share them somehow

dentarg commented 6 months ago

What's the problem with running the tests? The fixtures should be in the repo already

tete17 commented 6 months ago

Dont i need a cloudamqp token?

The documentation says something about loading a .enc file.

What is the colmand?

dentarg commented 6 months ago

Should not be need. The README has instructions.

tbroden84 commented 6 months ago

Just to mention, still need some tweaking for the test to work for SDK v2. Noticed all tests claimed there was missing interactions, which are there. There is also a deprecated function on how to chose the provider for each test case and some overall changes to be overlooked before they can be used. So it was not as straight forward that I had hoped.

tete17 commented 6 months ago

Silly me now I am running the acceptance test locally.

I am facing the missing interactions you guys mention. It seems the provider is being configured multiple times but I have no idea why.

@tbroden84 what deprecated function to select the provider does it still use I can't find it.

tete17 commented 6 months ago

It seems to be calling the ValidateChange("plan") function for an instance more times than before.

Not sure why but it would be interesting to record the fixtures again @tbroden84 @dentarg

tete17 commented 6 months ago

In fact if you remove the diff functions that execute the api calls the test work.

I am also now seeing the long delay you mention. It seems to be happening right at the end of the test after all the api calls have been completed.

tete17 commented 6 months ago

Further news about the delay. Running the test withTF_ACC_LOG=TRACE & TF_LOG=TRACE shows the reason for most of the delays.

There is a lot of artificial waits from the https://github.com/84codes/go-api package like these ones:

This is causing all test to last at least 10 seconds if not more .

I think they only last that long because terraform has slightly changed the order on which the requests are launched and therefore the fixtures are not correct anymore.

I am not able to recreate the fixtures as that requires a proper amqp api. My suggestion is to recreate the fixtures and we can double check whether they make sense or not.

dentarg commented 6 months ago

Would be great to make those all those sleeps in go-api configurable.

We have a plan to import go-api into this repository. Maybe we should to do that before continuing with this PR?

tbroden84 commented 6 months ago

My findings for today.

  1. terraform-json still get error with v0.12.0, just used latest version. Only did some quick searches and bumped it yesterday. The error I get with v0.12.0.
    Got error running Terraform: unsupported state format version: expected ["0.1" "0.2"], got "1.0"
  2. It seems to be calling the ValidateChange("plan") function for an instance more times than before.

Yes, doing a re-recording both ValidateChanges are called three times now, before two. Even with two, it's one time more than needed. Unless it has to do with the customdiff used and both are triggered. The extra call that is missing, is the reason for the error about missing interactions.

  1. There is a lot of artificial waits

Yes, more and more of them we have made configurable. For the test where wait been an issue, there is a save hook removing these requests from the fixtures. https://github.com/cloudamqp/terraform-provider-cloudamqp/blob/main/cloudamqp/provider_test.go#L79-L154.

In case of the basic alarm test, that wait is not being waited for. Finished in 11.8 ms according to the fixture. We also use the flag SkipRequestLatency for the recorder. https://github.com/cloudamqp/terraform-provider-cloudamqp/blob/main/cloudamqp/provider_test.go#L59. Should not simulate the durations.

Compared the time for TestAccAlarm_Basic between the SDK versions, before about 0.2 s now 6.4 s.

  1. what deprecated function to select the provider does it still use I can't find it.

Saw you found it here: https://github.com/tete17/terraform-provider-cloudamqp/blob/Update-to-sdk-v2/cloudamqp/provider_test.go#L158-L161 :)

tete17 commented 6 months ago

Hey @tbroden84 have you tried my branch?

I think it is using 0.12.0 and I am not seeing any of the "0.1", "0.2" errors. I wonder why.

I think the waits are triggered because of the wrong fixtures. The order in which terraform does requests has clearlly changed and that probablly leads to the go-api package to get results that trigger the artificial wait.

Can you regenerate the fixtures using my code please? I think that will fix all the issues we have.

dentarg commented 6 months ago

The order in which terraform does requests has clearlly changed

It would be good to fully understand this (what changed in Terraform?)

tbroden84 commented 6 months ago

Yes running your branch.

I think it is using 0.12.0 and I am not seeing any of the "0.1", "0.2" errors. I wonder why.

Which Terraform version are you using? Running latest myself (1.8.2). Tried switching version to 1.4.+, same error. Downgraded even further pre 1.0 (0.15.5) now I can run it without state version error. IIRC Terraform version went from 0.13 and got released as 1.0.0 a couple of years back.

The order in which terraform does requests has clearly changed

It would be good to fully understand this (what changed in Terraform?)

Yeah agree. From the previous link tete17 posted about migration. There is one mention about acceptance test driver being changed and more information about removed and added functions. Will dig deeper to get more knowledge about the changes.

Can you regenerate the fixtures using my code please? I think that will fix all the issues we have.

So far just re-recorded 2 fixtures and it helps them run and no longer have to wait on removing the instance. Was from there I noticed the increased replay time. Switch computer to a faster one and got Alarms down to 1 s, see what the rest will give.

tbroden84 commented 6 months ago

Bumped Terraform plugin SDK v2 to latest (2.4.3 -> 2.33.0), rearranged go.mod and deleted go.sum then go mod tidy command.

go.mod before tidy:

module github.com/cloudamqp/terraform-provider-cloudamqp

go 1.21

toolchain go1.21.3

require github.com/84codes/go-api v1.16.2

require (
    github.com/hashicorp/terraform-plugin-sdk/v2 v2.33.0
    github.com/tidwall/gjson v1.17.0
    github.com/tidwall/sjson v1.2.5
    gopkg.in/dnaeon/go-vcr.v3 v3.1.2
)

// replace github.com/84codes/go-api => ../../84codes/go-api

This did

This also increase the DEBUG logging when running the tests. Couldn't see any improvements in the test, hopefully more insight what actually happening.

Lots of version to keep in mind.

tete17 commented 5 months ago

Hey @tbroden84

Sorry for the delay.

Indeed I was running an ancient version of terraform. Updating revealed the problem. I updated the dependencies and the go.sum version file indeed shrinked quite a bit.

Setting TF_LOG=TRACE also revealed now more info. It seems the acceptance test run an initial plan before attempting to apply the changes which results in 2 runs of plans. That explains the 2 calls to the /api/plans api.

I am able to run the failed test of alarms in 1,8s with trace logging enabled. I think this is plenty fast as we also don't have any way to tweak them to make them faster in my opinion.

You mentioned you were recreating the fixtures. Can you re-create them and add them as commits to this branch?

I think that is the way forward.

tbroden84 commented 5 months ago

@tete17

No problem. It is even a third ´/api/plans´ call compared to only two before. They are all triggered when CustomDiff detecting changed for plan/region. Plan then apply sounds logic and should trigger this. However didn't find any good explanation last week why the third one being called. The state value for both are empty (similar to the other two), which causes the third request.

Also enough to run with TF_LOG=DEBUG to get the extra output. One big change is multiple providers can be used, how they are loaded and the behaviour. Multiple times during a replay, Terraform start and stop all providers (think this is the root cause that each test take longer time) resulting in the provider will be re-configured each time and the test configuration loaded again. The warning printed looks like this.

[WARN]  sdk.helper_schema: Previously configured provider being re-configured. This can cause issues in concurrent testing if the configurations are not equal.: tf_req_id=811d8029-a811-9f87-9ccd-d1abb025edfc tf_provider_addr=registry.terraform.io/hashicorp/cloudamqp tf_rpc=Configure

Re-recorded about 1/3 of the test last week while doing my investigation and testing. Didn't find any good solution but will record the rest during the day.

tbroden84 commented 5 months ago

Had to remove the test TestAccInstance_Upgrade from instance test due to internal backend changes. Otherwise they are now re-recorded. All tests takes about 40-80s depending on the computer I use to run them all.

tbroden84 commented 5 months ago

Thanks for the contribution, much appreciate. Will release this in the next release, together with some more changes we have planned.

Next iteration for SDK v2 will be to look into updating the ProviderFactory to ProtoV5ProviderFactory version. From what I have seen quite the re-write is needed. This will unlock prio, proposed and config states. Gives the provider full control over argument and attribute changes. From my understanding will also allow null values in configuration (not support with v1.+ version).