Open rightx2 opened 4 months ago
Hello @rightx2! 👋 I'm here to help you with any bugs, questions, or contributions. Let me know how I can assist you!
The issue you're encountering with the data_interval_end
variable being undefined when using dbt_vars
in Cosmos might be related to how the variable is being templated and passed. The yaml.dump
method is used to convert the dbt_vars
dictionary to a YAML string, which might not handle Jinja templating as expected.
To address this, you can try using json.dumps
instead of yaml.dump
to ensure that the variables are correctly passed to dbt models. Here is an example of how you can modify your configuration:
import json
from cosmos.config import ProjectConfig, ProfileConfig, RenderConfig, ExecutionConfig, LoadMode, ExecutionMode
from cosmos.task_group import DbtTaskGroup
project_config = ProjectConfig(
dbt_project_path="/opt/airflow/dbt/data-dbt",
project_name="data_dbt",
partial_parse=True,
dbt_vars=json.dumps({
"start_dt": '{{ data_interval_end }}',
}),
)
profile_config = ProfileConfig(
profile_name="data_dbt",
target_name="dev",
profiles_yml_filepath=Path("/opt/airflow/dbt/data-dbt", "profiles", "profiles.yml"),
)
render_config = RenderConfig(
load_method=LoadMode.DBT_LS,
emit_datasets=False,
test_behavior=TestBehavior.AFTER_EACH,
select=[
"models/my_model.sql",
],
)
execution_config = ExecutionConfig(
execution_mode=ExecutionMode.LOCAL,
)
operator_args = {
"install_deps": True,
}
dbt_op = DbtTaskGroup(
group_id="dbt_group",
project_config=project_config,
profile_config=profile_config,
render_config=render_config,
execution_config=execution_config,
operator_args=operator_args,
)
This change should help ensure that the data_interval_end
variable is correctly passed and templated within your dbt models. If the issue persists, it might be worth checking if there are any specific compatibility concerns between the versions of Cosmos and dbt-core you are using [1][2].
To continue talking to Dosu, mention @dosu.
Really important question: Does this error occur in task execution or in DAG parsing/rendering?
I.e. is the error message on the homepage of Airflow and causes the DAG to not even load at all?
I'm going ahead with the assumption that this occurs in DAG rendering and not task execution. The reason why is because I'm pretty confident I know what the problem is because I knew back in January that this problem would one day bite someone in the butt, and I have advocated for making changes to this precisely because of the issue you just ran into.
Right now dbt vars + env are strongly coupled across both rendering and execution, but they should be looser because of precisely what you are attempting to do.
In Airflow, {{ templated_variables }}
are not normally resolved until after a DagRun is initiated. So what happens is when your DagRun initiates and the task runs, {{ data_interval_end.strftime("%Y-%M-%d") }}
becomes (for example) "2024-06-21"
.
During rendering of the DAG, Jinja2 is not used at all. This means that the string literal "{{ data_interval_end.strftime("%Y-%M-%d") }}"
is passed to dbt. Because dbt uses Jinja, this means dbt is attempting to render it in its own Jinja2 environment, which doesn't have the same variables as Airflow's jinja environment.
The reason it doesn't raise an error when you do {{ data_interval_end }}
is because Jinja2 by default will parse a variable not in the namespace as none
. {{ asdfjkl123456789 }}
(i.e. gibberish) will not raise an error in Jinja2. However, when you attempt to call a method of an un-namespaced variable, then this is where errors can occur. E.g. {{ fake_variable }}
works but {{ fake_variable.fake_method() }}
will raise an error.
You should look into using LoadMode.DBT_MANIFEST
instead of LoadMode.DBT_LS
.
As per my comment in January, vars and the env should be allowed to be decoupled. Errors should not be raised when a user attempts to set vars
.
your assumption is right: it happend in rendering time. And the reason of the problem I was thinking about matches exactly with what you mentioned.. I think I'd take another render method. Thanks
One more note I didn't mention is that your use case is not atypical. I think injecting DagRun variables like the data interval end should be supported. It's very natural to want to do that. And it clearly is not supported right now. I think we should make this a more explicitly supported pattern. So keep doing what you're doing and don't be discouraged!
of course i will : )
Astronomer Cosmos Version
Other Astronomer Cosmos version (please specify below)
If "Other Astronomer Cosmos version" selected, which one?
1.4.3
dbt-core version
1.7.16
Versions of dbt adapters
dbt-impala==1.4.3 (but i don't think this issue related with adapter)
LoadMode
DBT_LS
ExecutionMode
LOCAL
InvocationMode
None
airflow version
2.9.1
Operating System
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)" NAME="Debian GNU/Linux" VERSION_ID="12" VERSION="12 (bookworm)"
If a you think it's an UI issue, what browsers are you seeing the problem on?
No response
Deployment
Official Apache Airflow Helm Chart
Deployment details
No response
What happened?
dbt_vars can raise "This can happen when calling a macro that does not exist"
Relevant log output
How to reproduce
I need to pass variable,
start_dt
, to dbt_models withdata_interval_end
macro of airflow. Below is my configuration for cosmos dag and it worked like a charm.However, when I tried to call function of macro like below, it raised error:
dbt model:
error:
I think this is due to how
yaml.dump
works in here (I think usingjson.dumps
will work...). Is there any way I can pass variable to dbt_models with macro function?Anything else :)?
written in above
Are you willing to submit PR?
Contact Details
rightx2@gmail.com