astronomer / astronomer-cosmos

Run your dbt Core projects as Apache Airflow DAGs and Task Groups with a few lines of code
https://astronomer.github.io/astronomer-cosmos/
Apache License 2.0
767 stars 170 forks source link

[Bug] GoogleCloudProfile dataset argument is not replaced by neither model or yml config, but instead used as a prefix #1334

Open vanAkim opened 1 day ago

vanAkim commented 1 day ago

Astronomer Cosmos Version

1.6.0

dbt-core version

1.8.8

Versions of dbt adapters

dbt-adapters 1.7.0 dbt-bigquery 1.8.3

LoadMode

AUTOMATIC

ExecutionMode

VIRTUALENV

InvocationMode

None

airflow version

2.5.3

Operating System

Ubuntu & Docker under WSL2 of windows 11

If a you think it's an UI issue, what browsers are you seeing the problem on?

No response

Deployment

Docker-Compose

Deployment details

No response

What happened?

As I'm using GoogleCloudOauthProfileMapping and passing the required arguments for profile_args, a project, my_gcp_project, and a dataset, my_gcp_dataset, are set.

By running the current set-up to build the simple my_first_dbt_model.sql of the jaffle_shop project, the table is correctly created in the correct location, my_gcp_project.my_gcp_dataset.my_first_dbt_model.

Now, if I try to overwrite these parameters for this specific model,

The word combination database/project and schema/dataset are interchangeable and working as is.

So the database/project is fully overwritten in the appropriate hierarchy (https://docs.getdbt.com/reference/model-configs#configuring-models) but schema/dataset are concatenated. However, the concatenation occurs only with the Cosmos argument and the dbt upper hierarchical config. For example, if I try both options described above, only the model config() will be concatenated with Cosmos value.

Relevant log output

How to reproduce

  1. Use a GCP profile mapping function, in my case GoogleCloudOauthProfileMapping
  2. Set a dataset value
  3. Set a schema/dataset value for a specific model, either config() or yml config.

Anything else :)?

No response

Are you willing to submit PR?

Contact Details

akim.van-eersel@dailymotion.com

dosubot[bot] commented 1 day ago

Hey there, @vanAkim! I'm here to assist you with any bugs, questions, or contributions you might have. Let's work together to tackle this issue.

The issue you're experiencing seems to be related to how Cosmos handles schema/dataset configurations when using GoogleCloudOauthProfileMapping. Specifically, it appears that the schema/dataset values are being concatenated rather than replaced when you attempt to override them in a model-specific configuration.

In Cosmos, the GoogleCloudOauthProfileMapping extracts the project and dataset from the Airflow BigQuery connection's extra field, and these values are used in the dbt profile configuration [1]. However, when you try to override these values in a model-specific configuration, it seems that the schema/dataset values are concatenated instead of being replaced.

This behavior might be due to the way Cosmos merges configurations from different sources. The _config_selector_ooo method in the DbtModelConfig class is responsible for merging configurations, ensuring that SQL file configurations override those from properties.yml files [2]. However, it seems that this logic might not be correctly handling the replacement of schema/dataset values in your case.

To address this, you might need to look into how the _config_selector_ooo method is applied in your setup or consider submitting a pull request to adjust the logic for handling schema/dataset overrides.

To continue talking to Dosu, mention @dosu.