astronomer / astronomer-cosmos

Run your dbt Core projects as Apache Airflow DAGs and Task Groups with a few lines of code
https://astronomer.github.io/astronomer-cosmos/
Apache License 2.0
516 stars 131 forks source link

[Bug] RuntimeError: Detected recursive loop for /usr/local/airflow/dags/dbt/dbt_venv/lib #1076

Open oliverrmaa opened 1 week ago

oliverrmaa commented 1 week ago

Astronomer Cosmos Version

Other Astronomer Cosmos version (please specify below)

If "Other Astronomer Cosmos version" selected, which one?

1.4.3

dbt-core version

1.7.17

Versions of dbt adapters

dbt-bigquery==1.7.4 dbt-core==1.7.17 dbt-extractor==0.5.1 dbt-semantic-interfaces==0.4.4

LoadMode

DBT_LS

ExecutionMode

LOCAL

InvocationMode

SUBPROCESS

airflow version

apache-airflow==2.9.2+astro.1

Operating System

Debian GNU/Linux 11 (bullseye)

If a you think it's an UI issue, what browsers are you seeing the problem on?

No response

Deployment

Astronomer

Deployment details

We have a main production deployment in Astro Cloud which we consider as production. We also do local development via astro dev start. We have continuous deployment set up through CircleCI which deploys merged PRs to our master branch to our production deployment via astro deploy --dags. For authentication to our data warehouse (Google BigQuery) in production, we use GoogleCloudServiceAccountDictProfileMapping and for local we use ProfileConfig where our dbt profiles.yml has a hardcoded path to a service account JSON file which is at the same path for each developer.

What happened?

We noticed that we are having a RuntimeError constantly (I believe every second according to our Astro production deployment logs): Detected recursive loop when walking DAG directory /usr/local/airflow/dags: /usr/local/airflow/dags/dbt/dbt_venv/lib has appeared more than once.

We aren't sure whether this error is going to cause parts of our setup to work sub optimally or connected to any other issues we are seeing in our production environment, i.e. like in https://github.com/astronomer/astronomer-cosmos/issues/1075, or if it will cause further issues in the future.

Relevant log output

06/25/24 15:19:42 PM    [2024-06-25T22:19:42.726+0000] {manager.py:737} INFO - Searching for files in /usr/local/airflow/dags Process ForkProcess-22795: Traceback (most recent call last): File "/usr/local/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/usr/local/lib/python3.11/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/usr/local/lib/python3.11/site-packages/airflow/dag_processing/manager.py", line 241, in _run_processor_manager processor_manager.start() File "/usr/local/lib/python3.11/site-packages/airflow/dag_processing/manager.py", line 476, in start return self._run_parsing_loop() ^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/airflow/dag_processing/manager.py", line 549, in _run_parsing_loop self._refresh_dag_dir() File "/usr/local/lib/python3.11/site-packages/airflow/dag_processing/manager.py", line 738, in _refresh_dag_dir self._file_paths = list_py_file_paths(self._dag_directory) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/airflow/utils/file.py", line 298, in list_py_file_paths file_paths.extend(find_dag_file_paths(directory, safe_mode)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/airflow/utils/file.py", line 311, in find_dag_file_paths for file_path in find_path_from_directory(directory, ".airflowignore"): 

File "/usr/local/lib/python3.11/site-packages/airflow/utils/file.py", line 241, in _find_path_from_directory raise RuntimeError( RuntimeError: Detected recursive loop when walking DAG directory /usr/local/airflow/dags: /usr/local/airflow/dags/dbt/dbt_venv/lib has appeared more than once.

How to reproduce

I believe having the same set up should cause this bug to occur, note however, we have 1000+ dbt models so a project of that size may be needed to test.

Here are some relevant snippets for where our dbt_venv will live:

(1) Here is the relevant parts of our dockerfile:

FROM quay.io/astronomer/astro-runtime:11.5.0

# install dbt into a virtual environment
ARG DBT_BIGQUERY_VERSION=1.7.4
RUN python -m venv dbt_venv && source dbt_venv/bin/activate && \
    pip install --no-cache-dir dbt-bigquery==$DBT_BIGQUERY_VERSION && deactivate

(2) Here is our ExecutionConfig

ExecutionConfig(
            execution_mode=ExecutionMode.LOCAL,
            dbt_executable_path=f"{os.environ['AIRFLOW_HOME']}/dbt_venv/bin/dbt",
        )

Anything else :)?

Here is a screenshot of our production deployment logs in Astro UI where we found the bug, as you can see the issue is continuously happening:

image

Are you willing to submit PR?

Contact Details

oma@forhims.com

dosubot[bot] commented 1 week ago

To resolve the RuntimeError indicating a recursive loop due to the dbt_venv/lib appearing more than once, you should ensure that the dbt_venv directory is excluded from the DAGs directory. Here’s how you can do it:

  1. Move the dbt_venv Directory: Move the dbt_venv directory outside of the DAGs directory. For example, place it in a different location such as /usr/local/airflow/dbt_venv.

  2. Update the dbt_executable_path: Update the dbt_executable_path in your DAG configuration to point to the new location of the dbt_venv directory.

Here’s an example of how you can update your ExecutionConfig in the DAG configuration:

import os
from datetime import datetime
from cosmos import DbtDag, ProjectConfig, ProfileConfig, ExecutionConfig
from cosmos.profiles import GoogleCloudServiceAccountDictProfileMapping

profile_config = ProfileConfig(
    profile_name="default",
    target_name="dev",
    profile_mapping=GoogleCloudServiceAccountDictProfileMapping(
        conn_id="google_cloud_default",
        profile_args={"schema": "public"},
    ),
)

execution_config = ExecutionConfig(
    execution_mode=ExecutionMode.LOCAL,
    dbt_executable_path=f"{os.environ['AIRFLOW_HOME']}/dbt_venv/bin/dbt",
)

my_cosmos_dag = DbtDag(
    project_config=ProjectConfig(
        "/usr/local/airflow/dags/my_dbt_project",
    ),
    profile_config=profile_config,
    execution_config=execution_config,
    schedule_interval="@daily",
    start_date=datetime(2023, 1, 1),
    catchup=False,
    dag_id="my_cosmos_dag",
    default_args={"retries": 2},
)

By ensuring that the dbt_venv directory is outside of the DAGs directory, you can avoid the recursive loop issue. Additionally, you can use MWAA's startup script to install dbt into a virtual environment, as outlined in the MWAA documentation. This ensures that the virtual environment is properly set up before the scheduler and webserver start [1][2].

To continue talking to Dosu, mention @dosu.