Open gladwincogoport opened 1 month ago
Hey there, @gladwincogoport! I'm here to help you with any questions or bugs you might have. Let's get to the bottom of this together!
To resolve the issue of Airflow not being able to find your dbt_project.yml
file, ensure that your dbt project is correctly placed in the DAGs directory and that the dbt_project_path
argument is set correctly when creating your DAG instance.
Here are the steps to follow:
Ensure the dbt project is in the correct directory:
Your folder structure seems correct, but make sure the path in your ProjectConfig
matches the actual location of your dbt_project.yml
file.
Set the dbt_project_path
argument correctly:
Update your DbtDag
configuration to correctly point to the dbt_project.yml
file within the Docker container.
from cosmos import DbtDag, ProjectConfig, ExecutionConfig
dbt_postgres_dag = DbtDag(
project_config=ProjectConfig(
dbt_project_path="/usr/local/airflow/dags/dbt/datapulse_transform",
),
profile_config=profile_config,
execution_config=ExecutionConfig(
dbt_executable_path=f"{os.environ['AIRFLOW_HOME']}/dbt_venv/bin/dbt",
),
schedule_interval="@daily",
start_date=datetime(2024, 8, 8),
catchup=False,
dag_id="dbt_datapulse_dag"
)
profiles.yml
file. You might need to manage the Docker image to keep it up-to-date and handle secrets appropriately.By following these steps, you should be able to resolve the issue of Airflow not being able to find your dbt_project.yml
file [1][2].
To continue talking to Dosu, mention @dosu.
@dosu
That fixed the project config path but now the code is unable to find the dbt executable
cosmos.config.CosmosConfigException: Unable to find the dbt executable, attempted:
To resolve the issue of Airflow not being able to find the dbt executable, ensure that the dbt_executable_path
is correctly set to the path of the dbt executable within the virtual environment you created. Here is an example of how to set this in your DAG file:
from cosmos import DbtDag, ProjectConfig, ExecutionConfig
from cosmos.profiles import PostgresUserPasswordProfileMapping
import os
from datetime import datetime
profile_config = ProfileConfig(
profile_name="default",
target_name="dev",
profile_mapping=PostgresUserPasswordProfileMapping(
conn_id="airflow_db",
profile_args={"schema": "public"},
),
)
dbt_postgres_dag = DbtDag(
project_config=ProjectConfig(
dbt_project_path="/usr/local/airflow/dags/dbt/datapulse_transform",
),
profile_config=profile_config,
execution_config=ExecutionConfig(
dbt_executable_path=f"{os.environ['AIRFLOW_HOME']}/dbt_venv/bin/dbt",
),
schedule_interval="@daily",
start_date=datetime(2024, 8, 8),
catchup=False,
dag_id="dbt_datapulse_dag"
)
Make sure that the dbt_executable_path
points to the correct location of the dbt executable within the virtual environment you created in your Dockerfile. The path should be something like ${os.environ['AIRFLOW_HOME']}/dbt_venv/bin/dbt
.
Additionally, ensure that the ExecutionConfig
is properly validated. If the dbt_executable_path
is invalid, you might encounter a CosmosConfigException
with a message indicating that the dbt executable could not be found. This validation is crucial for the proper execution of dbt tasks in Airflow.
To set up the virtual environment in your Dockerfile, you can use the following example:
FROM my-image:latest
# install dbt into a virtual environment
RUN python -m venv dbt_venv && source dbt_venv/bin/activate && \
pip install --no-cache-dir <your-dbt-adapter> && deactivate
Replace <your-dbt-adapter>
with the actual adapter you need (e.g., dbt-postgres
). This setup ensures that dbt and Airflow can coexist without dependency conflicts [1][2][3].
To continue talking to Dosu, mention @dosu.
@Dosu would I need to create separate dbt projects for different schedules ?
Lets say that I need to run models for the marketing vertical at an interval of 1 hour but I need to run the models for the finance vertical every 6 hours. How can I do this ?
To handle different schedules for separate dbt projects, you can create separate Airflow DAGs for each schedule. Here's how you can set up the schedules for your marketing and finance models:
from cosmos import DbtDag, ProjectConfig, ProfileConfig, ExecutionConfig
from cosmos.profiles import PostgresUserPasswordProfileMapping
from datetime import datetime
import os
profile_config = ProfileConfig(
profile_name="default",
target_name="dev",
profile_mapping=PostgresUserPasswordProfileMapping(
conn_id="airflow_db",
profile_args={"schema": "public"},
),
)
marketing_models = DbtDag(
project_config=ProjectConfig(
dbt_project_path="/usr/local/airflow/dags/dbt/marketing_project",
),
profile_config=profile_config,
execution_config=ExecutionConfig(
dbt_executable_path=f"{os.environ['AIRFLOW_HOME']}/dbt_venv/bin/dbt",
),
schedule_interval="0 * * * *", # This cron expression means every hour
start_date=datetime(2023, 1, 1),
catchup=False,
dag_id="marketing_models_dag",
default_args={"retries": 2},
)
from cosmos import DbtDag, ProjectConfig, ProfileConfig, ExecutionConfig
from cosmos.profiles import PostgresUserPasswordProfileMapping
from datetime import datetime
import os
profile_config = ProfileConfig(
profile_name="default",
target_name="dev",
profile_mapping=PostgresUserPasswordProfileMapping(
conn_id="airflow_db",
profile_args={"schema": "public"},
),
)
finance_models = DbtDag(
project_config=ProjectConfig(
dbt_project_path="/usr/local/airflow/dags/dbt/finance_project",
),
profile_config=profile_config,
execution_config=ExecutionConfig(
dbt_executable_path=f"{os.environ['AIRFLOW_HOME']}/dbt_venv/bin/dbt",
),
schedule_interval="0 */6 * * *", # This cron expression means every six hours
start_date=datetime(2023, 1, 1),
catchup=False,
dag_id="finance_models_dag",
default_args={"retries": 2},
)
By defining these DAGs, you can ensure that your marketing models run every hour and your finance models run every six hours. Make sure to place your dbt projects in the appropriate directories and configure the dbt_executable_path
correctly as per the Cosmos documentation [1][2].
To continue talking to Dosu, mention @dosu.
Astronomer Cosmos Version
Other Astronomer Cosmos version (please specify below)
If "Other Astronomer Cosmos version" selected, which one?
1.5.1
dbt-core version
1.8.4
Versions of dbt adapters
dbt-postgres 1.8.2
LoadMode
AUTOMATIC
ExecutionMode
DOCKER
InvocationMode
None
airflow version
apache-airflow 2.9.3
Operating System
MacOS 14.1.2
If a you think it's an UI issue, what browsers are you seeing the problem on?
No response
Deployment
Astronomer
Deployment details
No response
What happened?
My airflow cant find my dbt_project.yml file.
Following is my folder structure
. ├── Dockerfile ├── README.md ├── airflow_settings.yaml ├── dags │ ├── pycache │ │ ├── dbt_dag.cpython-311.pyc │ │ └── exampledag.cpython-311.pyc │ ├── dbt │ │ └── datapulse_transform │ │ ├── README.md │ │ ├── analyses │ │ ├── dbt_project.yml │ │ ├── logs │ │ │ └── dbt.log │ │ ├── macros │ │ ├── models │ │ │ ├── example │ │ │ │ ├── my_first_dbt_model.sql │ │ │ │ ├── my_second_dbt_model.sql │ │ │ │ └── schema.yml │ │ │ ├── marts │ │ │ │ └── mart1.sql │ │ │ └── staging │ │ │ ├── stg2.sql │ │ │ └── stg1.sql │ │ ├── seeds │ │ ├── snapshots │ │ ├── target │ │ │ ├── compiled │ │ │ │ └── datapulse_transform │ │ │ │ └── models │ │ │ │ ├── example │ │ │ │ │ ├── my_first_dbt_model.sql │ │ │ │ │ └── my_second_dbt_model.sql │ │ │ │ ├── marts │ │ │ │ │ └── mart1.sql │ │ │ │ └── staging │ │ │ │ ├── stg1.sql │ │ │ │ └── stg2.sql │ │ │ ├── graph.gpickle │ │ │ ├── graph_summary.json │ │ │ ├── manifest.json │ │ │ ├── partial_parse.msgpack │ │ │ ├── run │ │ │ │ └── datapulse_transform │ │ │ │ └── models │ │ │ │ ├── example │ │ │ │ │ ├── my_first_dbt_model.sql │ │ │ │ │ └── my_second_dbt_model.sql │ │ │ │ ├── marts │ │ │ │ │ └── mart1.sql │ │ │ │ └── staging │ │ │ │ ├── stg1.sql │ │ │ │ └── stg2.sql │ │ │ ├── run_results.json │ │ │ └── semantic_manifest.json │ │ └── tests │ ├── dbt_dag.py │ └── exampledag.py ├── include ├── logs │ └── dbt.log ├── packages.txt ├── plugins ├── requirements.txt └── tests └── dags └── test_dag_example.py
Relevant log output
How to reproduce
dbt_postgres_dag = DbtDag( project_config=ProjectConfig( "/Users/gladwingracias_air/Gladwin/cogo-transform/datapulse_dag/dags/datapulse_transform"),
)
Anything else :)?
No response
Are you willing to submit PR?
Contact Details
No response