Snowflake-Labs / terraform-provider-snowflake

Terraform provider for managing Snowflake accounts
https://registry.terraform.io/providers/Snowflake-Labs/snowflake/latest
MIT License
552 stars 420 forks source link

[Bug]: Excessive output for resource "snowflake_user" slows down Terraform #3118

Open rd-thomas-uhren opened 1 month ago

rd-thomas-uhren commented 1 month ago

Terraform CLI Version

1.5.7

Terraform Provider Version

0.96.0

Terraform Configuration

We manage approximately 2000 users via Terraform using the resource "snowflake_user".

Starting with 0.96.0 the state contains "show_output" and "parameters" including the default value and description; for each individual user.

This makes the state very large and slows down further operations.

This prevents us from upgrading to the latest version.

Is it realy necessary to store the same parameter default and description for each user?

Category

category:resource

Object type(s)

No response

Expected Behavior

State doesn't contain the description and default value for each individual resource.

Actual Behavior

State contains the description and default value for each individual resource.

Steps to Reproduce

Upgrade from 0.94.0 to 0.96.0

How much impact is this issue causing?

Low

Logs

No response

Additional Information

No response

Would you like to implement a fix?

AaronCoquet-Easypark commented 1 month ago

We don't have quite that many users, but share the same concern. The new structure is also difficult to navigate (both visually [in a plan output] and in code [when trying to access properties, etc.]). This structure is common to most / all of the V1 Candidate resources, and it's making migration unpleasant.

sfc-gh-asawicki commented 1 month ago

Hey @rd-thomas-uhren @AaronCoquet-Easypark. Thanks for reaching out to us.

I will prepare a longer answer with explanations, motivations, and recommendations tomorrow. For now, I have the following questions:

  1. @rd-thomas-uhren what is the scale of the performance drop you mention? What was the planning execution time in 0.94 and what is in 0.96 for your deployment? How much of this time does state download/upload take? Have you run the plan with TF_LOG=DEBUG environment variable to check the execution time of Snowflake queries (I ask this because there are more queries run in the new user resource, e.g., parameters are fetched for each user, and these parameters were not handled in the previous versions of the resource, so some performance drop is expected, I will add more context tomorrow)?
  2. @AaronCoquet-Easypark, can you elaborate on the problems in code navigation for these new structures? Have you read https://github.com/Snowflake-Labs/terraform-provider-snowflake/issues/3084 (it touches on the new structures' topic, in a slightly different context, but may still be helpful)?
rd-thomas-uhren commented 1 month ago

Hi @sfc-gh-asawicki,

The build step took twice as long, but we were unable to apply these changes since we ran into a timeout; after 12 hours.

We observed that it took approximately five minutes per user to write to the state file.

Regards, Thomas

sfc-gh-asawicki commented 1 month ago

Hey. A longer answer and recommendation below.

show_output and parameters were added as a result of https://github.com/Snowflake-Labs/terraform-provider-snowflake/blob/main/v1-preparations/CHANGES_BEFORE_V1.md#raw-snowflake-output. In short, the show_output (or some other backing fields implementation) is needed to properly handle changes to attributes that can have default values using the SDKv2. This was the technical decision and not the esthetic one. As for the outputs of the plan, we can't modify them, we hope that with plan modifiers in Terraform Plugin Framework we will be able to address this behavior too (we currently plan to migrate to plugin framework next year).

You are right, that description attribute is not logically needed. We can remove/conditionally remove it from the parameters output. We have even better news, it's possible that we will be able to conditionally remove the parameters output overall; We still have to validate the current logic, but it seems it's no longer needed for the proper handling of parameters. We can't do this with the show_output, though. We will schedule a task to verify and implement the conditional removal of the parameters output.

The plan performance drop is expected because the resource handles more Snowflake communication now: parameters were not handled directly inside the snowflake_user resource in earlier versions of the provider and they are now (check here). The motivation for such a change was explained here. Keep in mind how the Terraform providers work, each object is responsible for being able to fetch its state from the target API and modify it. Because of that, currently each user resource have to refresh/read its state (SHOW, DESCRIBE, and SHOW PARAMETERS), run any changes in apply (DROP/CREATE/ALTERs), and fetch the Snowflake state again. We have plans to target the performance of running similar/subsequent queries. The options currently on the table are: cacheing, batching requests, and/or not reading the state initially at all. They all have their pros, cons, and challenges. However, we will address them no earlier than early next year. For now, you can safely assume that every resource is solely responsible for its state management.

That leads to the topic of performance in the current provider version (and also upcoming V1).

First of all, we plan to do performance tests on our side in early November. We will investigate then the performance for variety amounts of user resources managed in a single deployment. We will document the outcomes publicly. We plan to address the problems then. We will also document our guidelines for the single deployment size for users.

The performance drop you mention is a Terraform limitation. It's not only about the file size and the need to download/upload it with every change but also the performance of Terraform building the object graph. You can check for example this thread https://discuss.hashicorp.com/t/remote-state-file-size-limit/46324. Our recommendation is to split your deployment into smaller ones. That way, you make the planning/apply faster for the single deployment and reduce the state file size at the same time. It also helps to put logical boundaries between the deployments but this is use-case specific.

liamjamesfoley commented 4 weeks ago

We're also seeing a big incease in Plan time since upgrading to either version 0.93 or 0.94 image

sfc-gh-asawicki commented 4 weeks ago

Hey @liamjamesfoley, some drop is expected; please check the third paragraph above (starting with The plan performance drop is expected because...).

rd-thomas-uhren commented 2 weeks ago

Hi @liamjamesfoley,

Do you have any timeline for the removal of the parameters output?

Regards, Thomas

liamjamesfoley commented 6 days ago

Hi @liamjamesfoley,

Do you have any timeline for the removal of the parameters output?

Regards, Thomas

Hi @rd-thomas-uhren I think you tagged the wrong user 😅