cagov / data-infrastructure

CalData infrastructure
https://cagov.github.io/data-infrastructure
MIT License
7 stars 0 forks source link

Changes for upgrading terraform snowflake provider from 0.92 to 0.97 #413

Closed ram-kishore-odi closed 2 weeks ago

ram-kishore-odi commented 4 weeks ago

After the initial setup and validation, the main change needed was to change snowflake_role to snowflake_account_role. Here is the partial output from terraform plan, Please review it and share your feedback.


module.elt.module.transforming["XL"].snowflake_warehouse.this will be updated in-place

~ resource "snowflake_warehouse" "this" { ~ enable_query_acceleration = "false" -> "default" id = "TRANSFORMING_XL_DEV"

Plan: 24 to add, 20 to change, 24 to destroy.

────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Note: You didn't use the -out option to save this plan, so Terraform can't guarantee to take exactly these actions if you run "terraform apply" now. (infra) ram.kishore@180 dev %


ian-r-rose commented 4 weeks ago

The changes look good to me, I think we still need to:

  1. Update the config for the prd environment
  2. Update the terraform lockfiles as per this
ram-kishore-odi commented 4 weeks ago

Hi @ian-r-rose, Thank you for your feedback and specifying the key next steps. Can you please let me know if you want me to go ahead apply the terraform changes to dev first and then proceed to the prd after taking care of the above ?

ian-r-rose commented 4 weeks ago

We don't really have a strong process here. I agree it's best to apply in dev first, but not sure what the sequence should be. This is what I have been doing so far:

  1. Make changes, apply in dev, ensure things are working as expected
  2. Update versions and lockfiles for both dev and prd, but don't apply in prd yet.
  3. Open pull request, request review from someone
  4. Merge PR
  5. After merge, apply in prd from main.

But because it's mostly been me doing terraform, the process isn't very formalized. What do you think a good set of steps would be for terraform-related PRs @ram-kishore-odi ?

ram-kishore-odi commented 3 weeks ago

Thank you for your additional feedback @ian-r-rose !

The process you described above looks good to me.

Here are some minor updates that are good in my view for the overall terraform related PRs -

I also think testing should be an integrated in the dev and prd deployment process So automating the deployment using a CI/CD pipeline would be a good value add to the overall process. Also including an "approval gate" in the production deployment step is commonly used Since building footprints job is going to take long to run, including validation steps that run quickly would also help I think.

ian-r-rose commented 3 weeks ago

Here are some minor updates that are good in my view for the overall terraform related PRs -

* Make changes, apply in dev, ensure things are working as expected (current process)  - I assume you are referring to building foot prints job here for validation

This sounds good thought I think validation will depend on what the terraform PR does. In some cases, the building footprints pipeline won't be a good test. For instance, in #122 we might just check for the appropriate warehouses in the Snowflake UI.

* Update versions and lock files for dev _(slight modification)_

Sure thing! Can you update the dev lockfile here?

* Open pull request, request review from someone (current process)

* Merge PR (current process)

* Update versions and lock files for prd

Do you mean making a second PR at this point? Or committing to main?

ram-kishore-odi commented 3 weeks ago

Yes, I 100% agree validation depends on what the PR does. This is the reason why having smaller validation tests to perform the checks would be help.

Yes I will to update dev lock file before proceeding further

Not a second PR. One would suffice. I meant committing to main.

ram-kishore-odi commented 3 weeks ago

Hello @ian-r-rose,

I removed all the 8 remaining resources after the first one was successful.Can you please review and let me know if you see any issues with these imports below ? I am just trying to avoid any manual rework. So your feedback is greatly appreciated. Also once these are done, I plan to apply the changes to dev.

Transform

terraform import 'module.elt.module.transform.snowflake_account_role.this["READWRITE"]' TRANSFORM_DEV_READWRITE terraform import 'module.elt.module.transform.snowflake_account_role.this["READWRITECONTROL"]' TRANSFORM_DEV_READWRITECONTROL

Analytics

terraform import 'module.elt.module.analytics.snowflake_account_role.this["READ"]' ANALYTICS_DEV_READ terraform import 'module.elt.module.analytics.snowflake_account_role.this["READWRITE"]' ANALYTICS_DEV_READWRITE terraform import 'module.elt.module.analytics.snowflake_account_role.this["READWRITECONTROL"]' ANALYTICS_DEV_READWRITECONTROL

Raw

terraform import 'module.elt.module.raw.snowflake_account_role.this["READ"]' RAW_DEV_READ terraform import 'module.elt.module.raw.snowflake_account_role.this["READWRITE"]' RAW_DEV_READWRITE terraform import 'module.elt.module.raw.snowflake_account_role.this["READWRITECONTROL"]' RAW_DEV_READWRITECONTROL

ian-r-rose commented 3 weeks ago

Those all look correct to me. With those imported, the remaining things you see in terraform apply should mostly be role grants, correct?

ram-kishore-odi commented 3 weeks ago

hi @ian-r-rose, All the imports were successful. We started with the following Plan: 50 to add, 11 to change, 9 to destroy and now the current situation is Plan: 41 to add, 11 to change, 0 to destroy. To answer your question about remaining things, yes they all look like role grants in the plan output.

ian-r-rose commented 3 weeks ago

Let's give it a shot!

ram-kishore-odi commented 3 weeks ago

hi @ian-r-rose, I felt it would be successful this time, but ran into an issue - Just pasting the full log for easy reference - Looks like it is failing while revoke privileges -

│ Error: Failed to revoke privileges to add │ │ with module.elt.snowflake_grant_privileges_to_account_role.imported_privileges_to_logger, │ on ../../modules/elt/roles.tf line 187, in resource "snowflake_grant_privileges_to_account_role" "imported_privileges_to_logger": │ 187: resource "snowflake_grant_privileges_to_account_role" "imported_privileges_to_logger" { │ │ Id: "LOGGER_DEV"|false|false|IMPORTED PRIVILEGES|OnAccountObject|DATABASE|"SNOWFLAKE" │ Privileges to add: [IMPORTED PRIVILEGES] │ Error: 001003 (42000): SQL compilation error: │ syntax error line 1 at position 24 unexpected 'IMPORTED'.


ram-kishore-odi commented 3 weeks ago

Hi @ian-r-rose, Looks like most of the changes went through. Just one change related to imported privileges needs to be applied based on the following plan log. Also I can see that the Transformer_Dev role has all the grants on Snowflake. Yesterday it did not have any granted roles. Looks we may have to explicitly list the privileges based on the online recommendations to fix the issue -

Terraform will perform the following actions:

module.elt.snowflake_grant_privileges_to_account_role.imported_privileges_to_logger will be updated in-place

~ resource "snowflake_grant_privileges_to_account_role" "imported_privileges_to_logger" { id = "\"LOGGER_DEV\"|false|false|IMPORTED PRIVILEGES|OnAccountObject|DATABASE|\"SNOWFLAKE\"" ~ privileges = [

Plan: 0 to add, 1 to change, 0 to destroy.

I am wondering if we can perform this change manually and import it into terraform (if we have no way to fix as "imported privileges" is not being recognized).

ian-r-rose commented 3 weeks ago

Yeah, I seem to recall that the "imported privileges" resource has caused trouble before (probably a bug in the provider that doesn't account for that permission). I think it's okay to do it manually and import if necessary

ram-kishore-odi commented 3 weeks ago

I compared the Logger_Dev and Logger_Prd roles on the snowflake side and they look identical (even without making changes related to imported privileges) Please see below.

So I think no additional updates are needed as we are only upgrading provider version and not adding new functionality

Can you please let me know if I am missing anything here ?

Logger_Dev Logger_Prd

ian-r-rose commented 3 weeks ago

Can you elaborate on where you are and what remains to be done here? I'm confused, are there changes left to apply?

ram-kishore-odi commented 3 weeks ago

I can explain the above comments and discuss the next steps with you in quick call. That may be easier. 5-10 mins would be sufficient.

ram-kishore-odi commented 3 weeks ago

Hi @ian-r-rose, Unfortunately the approach we discussed did not work. I encountered the exact same at the end. Please see the detailed steps below. I do not think there is a workaround for this known limitation of the snowflake provider. I feel it is first trying to revoke before adding - │ Error: Failed to revoke privileges to add

Here are the steps I followed

1. Revoke imported privileges on snowflake side

USE ROLE ACCOUNTADMIN; REVOKE IMPORTED PRIVILEGES ON DATABASE RAW_DEV FROM ROLE LOADER_DEV;

3. Remove the resource from terraform state

(infra) ram.kishore@180 dev % terraform state rm 'module.elt.snowflake_account_role.loader' Removed module.elt.snowflake_account_role.loader Successfully removed 1 resource instance(s).

5. Import resource into terraform

(infra) ram.kishore@180 dev % terraform import 'module.elt.snowflake_account_role.loader' LOADER_DEV module.elt.snowflake_account_role.loader: Importing from ID "LOADER_DEV"... module.elt.snowflake_account_role.loader: Import prepared! Prepared snowflake_account_role for import module.elt.snowflake_account_role.loader: Refreshing state... [id=LOADER_DEV] Import successful!

6. Run plan to see the changes to be applied. - *No difference at all from previous runs

module.elt.snowflake_grant_privileges_to_account_role.imported_privileges_to_logger will be updated in-place ~ resource "snowflake_grant_privileges_to_account_role" "imported_privileges_to_logger" { id = "\"LOGGER_DEV\"|false|false|IMPORTED PRIVILEGES|OnAccountObject|DATABASE|\"SNOWFLAKE\"" ~ privileges = [

Plan: 0 to add, 1 to change, 0 to destroy.

7. Apply the changes - Encountered the same error

module.elt.snowflake_grant_privileges_to_account_role.imported_privileges_to_logger: Modifying... [id="LOGGER_DEV"|false|false|IMPORTED PRIVILEGES|OnAccountObject|DATABASE|"SNOWFLAKE"] ╷ │ Error: Failed to revoke privileges to add │ │ with module.elt.snowflake_grant_privileges_to_account_role.imported_privileges_to_logger, │ on ../../modules/elt/roles.tf line 187, in resource "snowflake_grant_privileges_to_account_role" "imported_privileges_to_logger": │ 187: resource "snowflake_grant_privileges_to_account_role" "imported_privileges_to_logger" { │ │ Id: "LOGGER_DEV"|false|false|IMPORTED PRIVILEGES|OnAccountObject|DATABASE|"SNOWFLAKE" │ Privileges to add: [IMPORTED PRIVILEGES] │ Error: 001003 (42000): SQL compilation error: │ syntax error line 1 at position 24 unexpected 'IMPORTED'.

ian-r-rose commented 3 weeks ago

I see two problems with the steps you laid out:

  1. We need to revoke imported privileges from the logger role (LOGGER_DEV) on the database SNOWFLAKE. Your statement has a revocation from the loader role on the raw database.
  2. You removed the wrong resource from the state, we needed to remove the privilege grant, not the account role.

I did the following and it worked without issue:

  1. Remove the privilege grant from the state: terraform state rm module.elt.snowflake_grant_privileges_to_account_role.imported_privileges_to_logger
  2. Revoke the privilege in Snowflake:
    USE ROLE ACCOUNTADMIN;
    REVOKE IMPORTED PRIVILEGES ON DATABASE SNOWFLAKE FROM ROLE LOGGER_DEV;
  3. Apply the plan: terraform apply -parallelism=1.

After those steps, a subsequent plan showed no changes to be made.

ram-kishore-odi commented 2 weeks ago

Thank you @ian-r-rose for the corrections ! The steps worked for me too. Once this PR is approved, I can merge into main and proceed with prd deployment.

ian-r-rose commented 2 weeks ago

Looks like CI is broken, possible due to some role assignments being destroyed for service users. Can you take a look at whether the github service accounts have access to the transformer dev roles?

ram-kishore-odi commented 2 weeks ago

Hi @ian-r-rose, I compared the role assignments of DEV & PRD for all service accounts (just to validate though we do not use all of them in CI) and noticed that GITHUB_ACTIONS_SVC_USER_DEV is missing the role grant of TRANSFORMER_DEV. It only has READER_DEV

Do you want me to grant TRANSFORMER_DEV to GITHUB_ACTIONS_SVC_USER_DEV ?

I also noticed that MWAA_SVC_USER_DEV also has a grant of REPORTER_PRD. I know we are reusing the Airflow DAG setup for DEV & PRD. Is this the reason why his grants exists ? Please confirm

I also noticed another difference - Transformer_DEV is missing Usage privilege on the following two schemas but Transformer_PRD has usage privilege on these.

RAW_DEV.PUBLIC RAW_DEV.STATE_ENTITIES

ian-r-rose commented 2 weeks ago

Ah, I had remembered incorrectly. The reader_dev role is all that github should need in this repo (it's not enough in our CARB project, where we actually ran test builds through github).

Hi @ian-r-rose, I compared the role assignments of DEV & PRD for all service accounts (just to validate though we do not use all of them in CI) and noticed that GITHUB_ACTIONS_SVC_USER_DEV is missing the role grant of TRANSFORMER_DEV. It only has READER_DEV

Do you want me to grant TRANSFORMER_DEV to GITHUB_ACTIONS_SVC_USER_DEV ?

I was wrong above, I don't think it's necessary here.

I also noticed that MWAA_SVC_USER_DEV also has a grant of REPORTER_PRD. I know we are reusing the Airflow DAG setup for DEV & PRD. Is this the reason why his grants exists ? Please confirm

I think this role grant is used in the building footprints pipeline to unload data from the marts db to S3. It's not really well-documented right now.

I also noticed another difference - Transformer_DEV is missing Usage privilege on the following two schemas but Transformer_PRD has usage privilege on these.

RAW_DEV.PUBLIC RAW_DEV.STATE_ENTITIES

That might be an unintended consequence of deleting and recreating those roles. We mostly use future grants to ensure that the access roles have the appropriate usage on all the schemas, but now that the roles are newer than the schemas, it is missing the appropriate permissions. All of which makes me pretty sure that the right way forward for production is to make sure that none of the roles are dropped, we should be able to do most things through terraform state rm and terraform state import.

ram-kishore-odi commented 2 weeks ago

Thank you for the feedback. Very helpful ! Please see my notes below

  1. If GITHUB_ACTIONS_SVC_USER_DEV does not need additional grants, then I guess CI should not be broken. I am thinking may be Britt ran into issues when the dev environment changes were in flight. I can confirm with her

  2. I agree with your production upgrade approach. Thank you for notes. I will come back with the of list of items for terraform rm and import shortly for review

ian-r-rose commented 2 weeks ago

CI is also broken here, so something is still amiss, though I'm not sure what

ram-kishore-odi commented 2 weeks ago

Can you please review the production upgrade plan below and let me know if any changes are needed ?

Initial steps

1 . To run before a terraform upgrade from 0.92 to 0.97 terraform init -upgrade terraform validate (double checking)

Lock file and merging steps

  1. terraform providers lock -platform=linux_amd64 -platform=darwin_amd64 # include Mac and Linux binaries
  2. push the changes to branch
  3. Merge into main

Removing resources. -- Resource and corresponding ID listed below

Resources that will be removed (20) from terraform state

module.elt.snowflake_role.loader LOADER_DEV module.elt.snowflake_role.logger LOGGER_DEV module.elt.snowflake_role.reader READER_DEV module.elt.snowflake_role.reporter REPORTER_DEV module.elt.snowflake_role.transformer TRANSFORMER_DEV module.elt.module.analytics.snowflake_role.this["READ"] ANALYTICS_DEV_READ module.elt.module.analytics.snowflake_role.this["READWRITE"] ANALYTICS_DEV_READWRITE module.elt.module.analytics.snowflake_role.this["READWRITECONTROL"] ANALYTICS_DEV_READWRITECONTROL module.elt.module.loading["4XL"].snowflake_role.this LOADING_4XL_DEV_WH_MOU module.elt.module.loading["XS"].snowflake_role.this LOADING_XS_DEV_WH_MOU module.elt.module.raw.snowflake_role.this["READ"] RAW_DEV_READ module.elt.module.raw.snowflake_role.this["READWRITE"] RAW_DEV_READWRITE module.elt.module.raw.snowflake_role.this["READWRITECONTROL"] RAW_DEV_READWRITECONTROL module.elt.module.reporting["4XL"].snowflake_role.this REPORTING_4XL_DEV_WH_MOU module.elt.module.reporting["XS"].snowflake_role.this REPORTING_XS_DEV_WH_MOU module.elt.module.transform.snowflake_role.this["READ"] TRANSFORM_DEV_READ module.elt.module.transform.snowflake_role.this["READWRITE"] TRANSFORM_DEV_READWRITE module.elt.module.transform.snowflake_role.this["READWRITECONTROL"] TRANSFORM_DEV_READWRITECONTROL module.elt.module.transforming["4XL"].snowflake_role.this TRANSFORMING_4XL_DEV_WH_MOU module.elt.module.transforming["XL"].snowflake_role.this TRANSFORMING_XL_DEV_WH_MOU

Importing resources - Resource and corresponding ID listed below

Resources that will be imported (20) into terraform after removal

module.elt.snowflake_account_role.loader LOADER_DEV module.elt.snowflake_account_role.logger LOGGER_DEV module.elt.snowflake_account_role.reader READER_DEV module.elt.snowflake_account_role.reporter REPORTER_DEV module.elt.snowflake_account_role.transformer TRANSFORMER_DEV module.elt.module.analytics.snowflake_account_role.this["READ"] ANALYTICS_DEV_READ module.elt.module.analytics.snowflake_account_role.this["READWRITE"] ANALYTICS_DEV_READWRITE module.elt.module.analytics.snowflake_account_role.this["READWRITECONTROL"] ANALYTICS_DEV_READWRITECONTROL module.elt.module.loading["4XL"].snowflake_account_role.this LOADING_4XL_DEV_WH_MOU module.elt.module.loading["XS"].snowflake_account_role.this LOADING_XS_DEV_WH_MOU module.elt.module.raw.snowflake_account_role.this["READ"] RAW_DEV_READ module.elt.module.raw.snowflake_account_role.this["READWRITE"] RAW_DEV_READWRITE module.elt.module.raw.snowflake_account_role.this["READWRITECONTROL"] RAW_DEV_READWRITECONTROL module.elt.module.reporting["4XL"].snowflake_account_role.this REPORTING_4XL_DEV_WH_MOU module.elt.module.reporting["XS"].snowflake_account_role.this REPORTING_XS_DEV_WH_MOU module.elt.module.transform.snowflake_account_role.this["READ"] TRANSFORM_DEV_READ module.elt.module.transform.snowflake_account_role.this["READWRITE"] TRANSFORM_DEV_READWRITE module.elt.module.transform.snowflake_account_role.this["READWRITECONTROL"] TRANSFORM_DEV_READWRITECONTROL module.elt.module.transforming["4XL"].snowflake_account_role.this TRANSFORMING_4XL_DEV_WH_MOU module.elt.module.transforming["XL"].snowflake_account_role.this TRANSFORMING_XL_DEV_WH_MOU

Imported privileges

  1. Remove the privilege grant from the state: terraform state rm module.elt.snowflake_grant_privileges_to_account_role.imported_privileges_to_logger
  2. Revoke the privilege in Snowflake:USE ROLE ACCOUNTADMIN;REVOKE IMPORTED PRIVILEGES ON DATABASE SNOWFLAKE FROM ROLE LOGGER_DEV;


    Apply changes

Apply the plan: terraform apply -parallelism=1.

ian-r-rose commented 2 weeks ago

That all looks correct to me! We also need to figure out why CI is failing, I suspect that the github actions service account lost some of the role grants it needs

ram-kishore-odi commented 2 weeks ago

Hello @ian-r-rose, The first two steps, running init upgrade and the file locking have been completed.

Can you please review merge the branch into main so that I can proceed with the production upgrade from main branch ?

No failures in CI as well

*log output for reference ***

Step 1.

(base) ram.kishore@180 prd % conda activate infra (infra) ram.kishore@180 prd % terraform init -upgrade Initializing the backend... Upgrading modules...

Terraform has been successfully initialized!

Step 2.

(infra) ram.kishore@180 prd % terraform providers lock -platform=linux_amd64 -platform=darwin_amd64

Success! Terraform has updated the lock file.

Review the changes in .terraform.lock.hcl and then commit to your version control system to retain the new checksums.