Snowflake-Labs / terraform-provider-snowflake

Terraform provider for managing Snowflake accounts
https://registry.terraform.io/providers/Snowflake-Labs/snowflake/latest
MIT License
528 stars 410 forks source link

Allow configuring `storage_aws_external_id` for `snowflake_storage_integration` #2624

Open AleksaC opened 5 months ago

AleksaC commented 5 months ago

Terraform CLI and Provider Versions

Terraform v1.7.5 on darwin_arm64

Use Cases or Problem Statement

There is a cycle between the storage integration and role used for the storage integration since the role needs Snowflake AWS user arn as well as external id in its trust policy. Since user is the same for all storage integrations in the same Snowflake account it can be provided as a variable, but that still leaves external id.

Fortunately Snowflake allows setting external id when creating the storage integration so I could just generate the storage integration and pass it to both the role as well as the storage integration removing the cycle.

Proposal

Make the storage_aws_external_id field Optional instead of Computed since it's supported in the underlying API.

How much impact is this issue causing?

Medium

Additional Information

No response

sfc-gh-jcieslak commented 5 months ago

Hey @AleksaC. Thanks for reaching out to us.

We are currently working on redesigning existing resources as part of https://github.com/Snowflake-Labs/terraform-provider-snowflake/blob/main/ROADMAP.md#supporting-all-snowflake-ga-features. We will consider it when we will be going over snowflake_storage_integration.

AleksaC commented 5 months ago

Thanks for the response @sfc-gh-jcieslak.

Do you have any ETA? Would you accept a PR to expedite this, since the change seems relatively straightforward to implement and shouldn't impact the redesign?

sfc-gh-jcieslak commented 5 months ago

I think it would be better If you could provide a PR for it, because I'm not sure when we would take care of the snowflake_storage_integration. When implementing it just remember to add a test case in the storage_integration_acceptance_test.go with the ticket number (in the comment or in the function name) and use this parameter. Our contribution guide is kind of outdated (we'll be soon updating it to the latest standards), but for now just focus on changing the param to Optional, using it in the Create function, and writing a test for it. We'll guide you with comments, if anything is missing.

AleksaC commented 5 months ago

I should be able to pick it up in the next couple of days. I'll link the PR to this issue once I make progress on the stuff you mentioned.

parsoj commented 5 months ago

I'm also seeing this issue (a cyclic dependency between the AWS IAM role, and the Snowflake storage integration) -

However I believe the fix is a bit more involved than just allowing the user to set external_id

AWS IAM roles require a valid assume role policy to be defined when the role is created. And AWS doesn't allow for external policies to grant assume_role access to additional principals/users outside of that inline policy document. So there is no way to perform these steps in this order:

  1. create the role
  2. create the storage integration
  3. create the assume_role_policy

While its in technically possible to convince AWS to relax that constraint - I'd anticipate that we won't make much headway there, and we'd have a better outcome refactoring the Snowflake provider to be compatible with AWS.

I think the ideal pattern would be to split the snowflake_storage_integration into two resources: one resource that creates the IAM user that the integration will use (and makes it's ARN available as a resource attribute) - and another resource that creates (or finishes creating) the storage integration itself.

This way you get a clean, linear resolution order of: snowflake_storage_integration_iam_user -> aws_iam_role -> snowflake_storage_integration

AleksaC commented 4 months ago

@parsoj What you describe is not possible to achieve at the provider level because Snowflake API doesn't support creation of storage integration users.

Currently a single user is used for all storage integrations:

Snowflake provisions a single IAM user for your entire Snowflake account. All S3 storage integrations in your account use that IAM user.

I'm not sure how big of a change it would be to allow for creation of users for storage integrations, but even with the current situation it's easy to work around the issue by simply grabbing the user from a previous storage integration and passing it as a variable.

Since the user is the same for all storage integrations it would be nice if it was possible to obtain its ARN independently of any storage integration. In this case it would be possible to introduce a data source that could get it, solving the problem completely.

parsoj commented 4 months ago

@AleksaC Thats a good point, and good idea to do a data-source.

If there is only one IAM user for a Snowflake account, and it's static, that sounds really promising. I'm hoping that the IAM user for an account will always exist as long as the account does (and we don't have to create a storage integration to ensure the IAM user gets created)?

If so - a data source sounds like just the ticket!

@sfc-gh-jcieslak I'm digging through the docs now, but I'm not finding anything regarding the lifecycle of the IAM user for Snowflake accounts. Can we safely assume that IAM user will already exist?

Also - from the docs for creating storage integrations, it looks like DESC INTEGRATION is the documented way to get that IAM user's ARN from Snowflake. Are there any other API calls we can use to get that information?

parsoj commented 4 months ago

I saw that Snowflake is looking to do some broader re-design on it's Terraform provider. I think that's a great idea! If you'd indulge me a bit - I'd like to offer some broad suggestions there.

Any API or SQL-based workflow that you currently have in the docs that requires an ALTER statement to get everything working - is going to have a really hard time translating to Terraform. Terraform has a major design philosophy of "immutable infrastructure" - not just as a nice-to-have, but as a firm pre-requisite that enables the language to be fully declarative. Terraform offers very little in the way of imperative "escape-hatches" in the language (and they strongly discourage using any of them).

You may actually find it necessary to re-work some of your API workflows to support "immutability" in order to really get full Terraform support across the finish line. Everything has to be achievable with CREATE and read/SELECT statements only - otherwise there may always be some half-manual/half-terraform workflow that your TF provider requires.

Thanks for listening 🙏

pfnsec commented 3 months ago

Hi all, I too had this issue! I resolved it by removing the dependency from the IAM role ARN to the storage integration. I managed this by computing the IAM role ARN from the AWS account ID and role name instead of reading it from the IAM role's outputs. This is a bit of a hack, I know, but it works in the meantime! The role name -> ARN function is at least simple enough that you can hopefully adapt my basic version to your environment.

data "aws_caller_identity" "current" {}

locals {
  new_storage_aws_role_name = "${var.integration_details.name}-role-${random_id.iam_slug[0].hex}"

  storage_aws_role_arn = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/${local.new_storage_aws_role_name}"

}

resource "aws_iam_role" "snowpipe_role" {

  name = local.new_storage_aws_role_name
  // And now assume_role_policy can depend on the storage integration's output (IAM user & external_id) 
  // without a circular dependency!
}