cagov / data-infrastructure

CalData infrastructure
https://cagov.github.io/data-infrastructure
MIT License
7 stars 0 forks source link

Upgrade to dbt Versionless #423

Open summer-mothwood opened 3 days ago

summer-mothwood commented 3 days ago

We want to upgrade our dbt Cloud projects to start using dbt Versionless. Here is my write up on the changes we should be aware of from dbt 1.7 to the current state of Versionless, and what to expect w/r/t adapting to new changes in Versionless moving forward: https://docs.google.com/document/d/1klIhkavJX8ljictkFLgvddsEKZNQX_m_b7kqpapUfqc/edit?usp=sharing

Luckily the upgrade itself should be relatively low-lift -- there are very few configuration or syntax changes that are likely to break anything in our current set ups. With that in mind, here is the plan for upgrading:

This sprint:

  1. Upgrade Dev env in DSE Infrastructure project
  2. Test and observe new features
  3. Upgrade Prd env in DSE infrastructure project
  4. test and observe new features

Next sprint:

  1. Upgrade Dev env in Caltrans project and inform Caltrans team
  2. Test and observe new features, incl. microbatching
  3. (If all goes well) Upgrade Prd env in Caltrans project
  4. test and observe new features
summer-mothwood commented 3 days ago

@ram-kishore-odi I don't seem to have developer access to the ODI Snowflake account. The only role I have access to is Public, is this something I can ask your help on? Image

ram-kishore-odi commented 3 days ago

Hi @summer-mothwood, Yes sure. You should now have developer access ! Please let me know if you encounter any issues.

summer-mothwood commented 2 days ago

@ram-kishore-odi that worked, thanks Ram!

summer-mothwood commented 2 days ago

@ram-kishore-odi Hey Ram! I can now view data in the ODI snowflake account in both Snowflake and dbt Cloud, so I was able to succesfully view and make edits to the repo in both Snowflake and dbt Cloud. But there still seems to be something off with my permissions in Snowflake, because I can't create a new schema to run any code while in dbt Cloud:

Image

And when I try to run or build the schema from dbt Cloud, it looks like I get stuck because my role doesn't have any access to RAW_PRD. Image

Not sure if this is something I'm not doing right in dbt, or if the permissions for TRANSFORMER_DEV need some updating. cc @britt-allen

ram-kishore-odi commented 1 day ago

Hi @summer-mothwood,

Thank you for providing the details about the issue. It appears that the problem is due to a missing privilege – CREATE SCHEMA. None of the roles assigned to your user currently have this grant.

Hi @ian-r-rose and @britt-allen,

Can you please let me know if you have encountered this issue before? If we need to move forward, shall I add the CREATE SCHEMA privilege to the Transformer_Dev role via Terraform ?

ian-r-rose commented 1 day ago

TRANSFORMER_DEV already has access to CREATE SCHEMA in TRANSFORM_DEV, so the issue is likely something else: https://github.com/cagov/data-infrastructure/blob/ebd59ff8c5559a54b19fa90488219ed65e2ad6e2/terraform/snowflake/modules/database/main.tf#L42

Not sure what the problem is just yet, @summer-mothwood would you be able to do a quick screenshare?

summer-mothwood commented 1 day ago

It looks like that second issue might be the real issue then -- needing access to RAW_PRD. TRANSFORMER_DEV does not have any read access to that database, and it's trying to access it when I run any of the models. The env variables I have set up look like this -- should I go ahead and change that RAW_PRD to RAW_DEV, or should we give TRANSFORMER_DEV read access to RAW_PRD?

image
ian-r-rose commented 1 day ago

Yes, that is almost certainly the issue! In some of the terraform work, the grant to production data was lost. There is documentation of how it is set up and what the appropriate grant should be here

ram-kishore-odi commented 1 day ago

Hi @ian-r-rose, The link in your comment is suggesting that the RAW_PRD_READ TO ROLE TRANSFORMER_DEV grant is managed outside of terraform. Unless it is explicitly granted after the terraform upgrade, it is not going to be present. Can you please let me know if I am missing anything here ? My point is that there is no fix / action needed on the terraform side as the behavior being encountered is expected. Do you agree ?

ian-r-rose commented 1 day ago

The reason it was lost in the terraform upgrade is that the TRANSFORMER_DEV role was accidentally destroyed and recreated. When we take the path of migrating the terraform state using rm/import, the grant should be maintained.

I agree, no action is required in the terraform configuration (unless we want to move this grant into the configuration!).

ram-kishore-odi commented 1 day ago

Please note that the RAW_PRD_READ has been granted to ROLE TRANSFORMER_DEV in the CalTrans snowflake instance.

ian-r-rose commented 1 day ago

@ram-kishore-odi that one should have already been there, the one that was missing was in the main snowflake instance

ram-kishore-odi commented 1 day ago

@ian-r-rose Yes, I did it for the main ODI snowflake instance a short while ago. I am about to grant the CREATE FUNCTION' for CalTrans Transformer_Dev on the public schema. Can you please confirm I can go ahead ?

ian-r-rose commented 1 day ago

Confirmed!