astronomer / astronomer-cosmos

Run your dbt Core projects as Apache Airflow DAGs and Task Groups with a few lines of code
https://astronomer.github.io/astronomer-cosmos/
Apache License 2.0
590 stars 152 forks source link

Airflow is unable to find dbt_project.yml #1145

Open gladwincogoport opened 1 month ago

gladwincogoport commented 1 month ago

Astronomer Cosmos Version

Other Astronomer Cosmos version (please specify below)

If "Other Astronomer Cosmos version" selected, which one?

1.5.1

dbt-core version

1.8.4

Versions of dbt adapters

dbt-postgres 1.8.2

LoadMode

AUTOMATIC

ExecutionMode

DOCKER

InvocationMode

None

airflow version

apache-airflow 2.9.3

Operating System

MacOS 14.1.2

If a you think it's an UI issue, what browsers are you seeing the problem on?

No response

Deployment

Astronomer

Deployment details

No response

What happened?

My airflow cant find my dbt_project.yml file.

Following is my folder structure

. ├── Dockerfile ├── README.md ├── airflow_settings.yaml ├── dags │ ├── pycache │ │ ├── dbt_dag.cpython-311.pyc │ │ └── exampledag.cpython-311.pyc │ ├── dbt │ │ └── datapulse_transform │ │ ├── README.md │ │ ├── analyses │ │ ├── dbt_project.yml │ │ ├── logs │ │ │ └── dbt.log │ │ ├── macros │ │ ├── models │ │ │ ├── example │ │ │ │ ├── my_first_dbt_model.sql │ │ │ │ ├── my_second_dbt_model.sql │ │ │ │ └── schema.yml │ │ │ ├── marts │ │ │ │ └── mart1.sql │ │ │ └── staging │ │ │ ├── stg2.sql │ │ │ └── stg1.sql │ │ ├── seeds │ │ ├── snapshots │ │ ├── target │ │ │ ├── compiled │ │ │ │ └── datapulse_transform │ │ │ │ └── models │ │ │ │ ├── example │ │ │ │ │ ├── my_first_dbt_model.sql │ │ │ │ │ └── my_second_dbt_model.sql │ │ │ │ ├── marts │ │ │ │ │ └── mart1.sql │ │ │ │ └── staging │ │ │ │ ├── stg1.sql │ │ │ │ └── stg2.sql │ │ │ ├── graph.gpickle │ │ │ ├── graph_summary.json │ │ │ ├── manifest.json │ │ │ ├── partial_parse.msgpack │ │ │ ├── run │ │ │ │ └── datapulse_transform │ │ │ │ └── models │ │ │ │ ├── example │ │ │ │ │ ├── my_first_dbt_model.sql │ │ │ │ │ └── my_second_dbt_model.sql │ │ │ │ ├── marts │ │ │ │ │ └── mart1.sql │ │ │ │ └── staging │ │ │ │ ├── stg1.sql │ │ │ │ └── stg2.sql │ │ │ ├── run_results.json │ │ │ └── semantic_manifest.json │ │ └── tests │ ├── dbt_dag.py │ └── exampledag.py ├── include ├── logs │ └── dbt.log ├── packages.txt ├── plugins ├── requirements.txt └── tests └── dags └── test_dag_example.py

Relevant log output

Broken DAG: [/usr/local/airflow/dags/dbt_dag.py]
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/cosmos/converter.py", line 214, in _init_
    project_config.validate_project()
  File "/usr/local/lib/python3.11/site-packages/cosmos/config.py", line 207, in validate_project
    raise CosmosValueError(f"Could not find {name} at {path}")
cosmos.exceptions.CosmosValueError: Could not find dbt_project.yml at /Users/gladwingracias_air/Gladwin/cogo-transform/datapulse_dag/dags/datapulse_transform/dbt_project.yml

How to reproduce

dbt_postgres_dag = DbtDag( project_config=ProjectConfig( "/Users/gladwingracias_air/Gladwin/cogo-transform/datapulse_dag/dags/datapulse_transform"),

# operator_args={"install_deps": True},
profile_config=profile_config,
execution_config=ExecutionConfig(
    dbt_executable_path=f"{os.environ['AIRFLOW_HOME']}/dbt_venv/bin/dbt",),
schedule_interval="@daily",
start_date=datetime(2024, 8, 8),  # yyyy-mm-dd
catchup=False,
dag_id="dbt_datapulse_dag"

)

Anything else :)?

No response

Are you willing to submit PR?

Contact Details

No response

dosubot[bot] commented 1 month ago

Hey there, @gladwincogoport! I'm here to help you with any questions or bugs you might have. Let's get to the bottom of this together!

To resolve the issue of Airflow not being able to find your dbt_project.yml file, ensure that your dbt project is correctly placed in the DAGs directory and that the dbt_project_path argument is set correctly when creating your DAG instance.

Here are the steps to follow:

  1. Ensure the dbt project is in the correct directory: Your folder structure seems correct, but make sure the path in your ProjectConfig matches the actual location of your dbt_project.yml file.

  2. Set the dbt_project_path argument correctly: Update your DbtDag configuration to correctly point to the dbt_project.yml file within the Docker container.

from cosmos import DbtDag, ProjectConfig, ExecutionConfig

dbt_postgres_dag = DbtDag(
    project_config=ProjectConfig(
        dbt_project_path="/usr/local/airflow/dags/dbt/datapulse_transform",
    ),
    profile_config=profile_config,
    execution_config=ExecutionConfig(
        dbt_executable_path=f"{os.environ['AIRFLOW_HOME']}/dbt_venv/bin/dbt",
    ),
    schedule_interval="@daily",
    start_date=datetime(2024, 8, 8),
    catchup=False,
    dag_id="dbt_datapulse_dag"
)
  1. Ensure Docker image contains necessary dbt setup: Make sure your Docker image has the dbt pipelines and a profiles.yml file. You might need to manage the Docker image to keep it up-to-date and handle secrets appropriately.

By following these steps, you should be able to resolve the issue of Airflow not being able to find your dbt_project.yml file [1][2].

To continue talking to Dosu, mention @dosu.

gladwincogoport commented 1 month ago

@dosu That fixed the project config path but now the code is unable to find the dbt executable cosmos.config.CosmosConfigException: Unable to find the dbt executable, attempted: and </usr/local/airflow/dbt_venv/bin/dbt>.

dosubot[bot] commented 1 month ago

To resolve the issue of Airflow not being able to find the dbt executable, ensure that the dbt_executable_path is correctly set to the path of the dbt executable within the virtual environment you created. Here is an example of how to set this in your DAG file:

from cosmos import DbtDag, ProjectConfig, ExecutionConfig
from cosmos.profiles import PostgresUserPasswordProfileMapping
import os
from datetime import datetime

profile_config = ProfileConfig(
    profile_name="default",
    target_name="dev",
    profile_mapping=PostgresUserPasswordProfileMapping(
        conn_id="airflow_db",
        profile_args={"schema": "public"},
    ),
)

dbt_postgres_dag = DbtDag(
    project_config=ProjectConfig(
        dbt_project_path="/usr/local/airflow/dags/dbt/datapulse_transform",
    ),
    profile_config=profile_config,
    execution_config=ExecutionConfig(
        dbt_executable_path=f"{os.environ['AIRFLOW_HOME']}/dbt_venv/bin/dbt",
    ),
    schedule_interval="@daily",
    start_date=datetime(2024, 8, 8),
    catchup=False,
    dag_id="dbt_datapulse_dag"
)

Make sure that the dbt_executable_path points to the correct location of the dbt executable within the virtual environment you created in your Dockerfile. The path should be something like ${os.environ['AIRFLOW_HOME']}/dbt_venv/bin/dbt.

Additionally, ensure that the ExecutionConfig is properly validated. If the dbt_executable_path is invalid, you might encounter a CosmosConfigException with a message indicating that the dbt executable could not be found. This validation is crucial for the proper execution of dbt tasks in Airflow.

To set up the virtual environment in your Dockerfile, you can use the following example:

FROM my-image:latest

# install dbt into a virtual environment
RUN python -m venv dbt_venv && source dbt_venv/bin/activate && \
    pip install --no-cache-dir <your-dbt-adapter> && deactivate

Replace <your-dbt-adapter> with the actual adapter you need (e.g., dbt-postgres). This setup ensures that dbt and Airflow can coexist without dependency conflicts [1][2][3].

To continue talking to Dosu, mention @dosu.

gladwincogoport commented 1 month ago

@Dosu would I need to create separate dbt projects for different schedules ?

Lets say that I need to run models for the marketing vertical at an interval of 1 hour but I need to run the models for the finance vertical every 6 hours. How can I do this ?

dosubot[bot] commented 1 month ago

To handle different schedules for separate dbt projects, you can create separate Airflow DAGs for each schedule. Here's how you can set up the schedules for your marketing and finance models:

  1. Marketing Models (Every Hour):
from cosmos import DbtDag, ProjectConfig, ProfileConfig, ExecutionConfig
from cosmos.profiles import PostgresUserPasswordProfileMapping
from datetime import datetime
import os

profile_config = ProfileConfig(
    profile_name="default",
    target_name="dev",
    profile_mapping=PostgresUserPasswordProfileMapping(
        conn_id="airflow_db",
        profile_args={"schema": "public"},
    ),
)

marketing_models = DbtDag(
    project_config=ProjectConfig(
        dbt_project_path="/usr/local/airflow/dags/dbt/marketing_project",
    ),
    profile_config=profile_config,
    execution_config=ExecutionConfig(
        dbt_executable_path=f"{os.environ['AIRFLOW_HOME']}/dbt_venv/bin/dbt",
    ),
    schedule_interval="0 * * * *",  # This cron expression means every hour
    start_date=datetime(2023, 1, 1),
    catchup=False,
    dag_id="marketing_models_dag",
    default_args={"retries": 2},
)
  1. Finance Models (Every Six Hours):
from cosmos import DbtDag, ProjectConfig, ProfileConfig, ExecutionConfig
from cosmos.profiles import PostgresUserPasswordProfileMapping
from datetime import datetime
import os

profile_config = ProfileConfig(
    profile_name="default",
    target_name="dev",
    profile_mapping=PostgresUserPasswordProfileMapping(
        conn_id="airflow_db",
        profile_args={"schema": "public"},
    ),
)

finance_models = DbtDag(
    project_config=ProjectConfig(
        dbt_project_path="/usr/local/airflow/dags/dbt/finance_project",
    ),
    profile_config=profile_config,
    execution_config=ExecutionConfig(
        dbt_executable_path=f"{os.environ['AIRFLOW_HOME']}/dbt_venv/bin/dbt",
    ),
    schedule_interval="0 */6 * * *",  # This cron expression means every six hours
    start_date=datetime(2023, 1, 1),
    catchup=False,
    dag_id="finance_models_dag",
    default_args={"retries": 2},
)

By defining these DAGs, you can ensure that your marketing models run every hour and your finance models run every six hours. Make sure to place your dbt projects in the appropriate directories and configure the dbt_executable_path correctly as per the Cosmos documentation [1][2].

To continue talking to Dosu, mention @dosu.