Open alexr-lh opened 1 week ago
Hey there, @alexr-lh! I'm here to assist you with any bugs, questions, or contribution opportunities. Let's work together to tackle this issue!
The issue you're experiencing seems to be related to how Cosmos handles model selection when a model shares the same name as the dbt package. Unfortunately, there isn't specific information available in the Cosmos repository about handling model name clashes with the dbt package. This might be a limitation or an oversight in the current implementation.
Given the behavior you're observing, it seems that Cosmos is not correctly differentiating between a single model and the entire project when the model name matches the package name. This results in the entire package being included in the run statement instead of just the single model.
To address this, you might consider submitting a pull request to the Cosmos repository to handle this specific case. You could propose a change that ensures the run statement targets only the specific model, even when its name matches the package name. This would involve modifying the logic that determines which models to include in a run statement, possibly by adding a check to differentiate between a model and a package when their names clash.
If you're willing to submit a PR, as indicated, this could be a valuable contribution to the project.
To continue talking to Dosu, mention @dosu.
@dosu Where in the codebase is the logic for determining what to include in the run task kept?
The logic for determining what to include in the run task, specifically for dbt model selection, is located in the file cosmos/dbt/selector.py
. This file contains classes and functions such as NodeSelector
, GraphSelector
, select_nodes
, apply_select_filter
, and apply_exclude_filter
, which implement the logic for selecting dbt nodes based on various criteria like paths, tags, and configurations [1].
To continue talking to Dosu, mention @dosu.
@dosu Is that used for the DBT_LS_FILE load method? That file seems to support the select
/exclude
options which are not relevant for the DBT_LS_FILE method.
The cosmos/dbt/selector.py
file, which supports select
and exclude
options, is not directly relevant for the DBT_LS_FILE
load method. The DBT_LS_FILE
method does not support RenderConfig.select
and RenderConfig.exclude
options. Instead, it relies on the output of the dbt ls
command from a file, without applying additional selection or exclusion logic [1]. The handling for DBT_LS_FILE
is more likely implemented in the cosmos/dbt/graph.py
file, which manages various load methods, including DBT_LS_FILE
[2].
To continue talking to Dosu, mention @dosu.
Hi @alexr-lh, Thank you very much for reporting this issue. We were not testing this use case, and I'm glad you identified and reported it.
Based on your description, the problem is building the dbt command when running model tasks. This does not seem to be an issue in LoadMethod.DBT_LS
because it does not affect how the DAG is converted from dbt, from a topology perspective, but on the task execution itself.
My impression is that if you were using another execution mode, you'd still face the same problem.
The commands run by Cosmos operators are built using a few parts of the code, using inheritance. In this particular case, two relevant parts are:
We'd love to receive a contribution, please, let us know if you'd like any support.
Astronomer Cosmos Version
1.6.0
dbt-core version
1.8.7
Versions of dbt adapters
dbt-bigquery==1.8.3
LoadMode
DBT_LS
ExecutionMode
LOCAL
InvocationMode
None
airflow version
2.7.3
Operating System
Cloud Composer
If a you think it's an UI issue, what browsers are you seeing the problem on?
No response
Deployment
Google Cloud Composer
Deployment details
No response
What happened?
Models that have the same name as the dbt package will include the entire package in their run statement.
I have a model which shares a name with the project. Instead of the run task building a single model it's building the entire project. This in turn causes intermittent failures when temporary tables required by other models are being deleted mid run as they're being run in two tasks.
I'd expect this to be looking specifically for single models.
Relevant log output
No response
How to reproduce
Create a project named
name_clash
Create a model namedno_clash
and set it to incremental mode. Create a model also namedname_clash
and set to incremental mode (could probably be any materialization). Make sure the query in theno_clash
model takes a while to complete. Trigger the DAG You'll see the Cosmos DAG task forno_clash
contains just one model in the output whilename_clash
includes both of the models.Anything else :)?
No response
Are you willing to submit PR?
Contact Details
alex.richardson@loveholidays.com