astronomer / astronomer-cosmos

Run your dbt Core projects as Apache Airflow DAGs and Task Groups with a few lines of code
https://astronomer.github.io/astronomer-cosmos/
Apache License 2.0
516 stars 131 forks source link

[Bug] /tmp/ File Not Found Error Causing Task Failure for dbt Cosmos Tasks #1075

Open oliverrmaa opened 1 week ago

oliverrmaa commented 1 week ago

Astronomer Cosmos Version

Other Astronomer Cosmos version (please specify below)

If "Other Astronomer Cosmos version" selected, which one?

1.4.3

dbt-core version

1.7.17

Versions of dbt adapters

dbt-bigquery==1.7.4 dbt-core==1.7.17 dbt-extractor==0.5.1 dbt-semantic-interfaces==0.4.4

LoadMode

DBT_LS

ExecutionMode

LOCAL

InvocationMode

SUBPROCESS

airflow version

apache-airflow==2.9.2+astro.1

Operating System

Debian GNU/Linux 11 (bullseye)

If a you think it's an UI issue, what browsers are you seeing the problem on?

No response

Deployment

Astronomer

Deployment details

We have a main production deployment in Astro Cloud which we consider as production. We also do local development via astro dev start. We have continuous deployment set up through CircleCI which deploys merged PRs to our master branch to our production deployment via astro deploy --dags. For authentication to our data warehouse (Google BigQuery) in production, we use GoogleCloudServiceAccountDictProfileMapping and for local we use ProfileConfig where our dbt profiles.yml has a hardcoded path to a service account JSON file which is at the same path for each developer.

What happened?

We are still intermittently seeing FileNotFoundError: [Errno 2] No such file or directory for /tmp files every few hours or so across multiple DAGs ever since the inception of our Astronomer/Cosmos setup. This error appears on Cosmos created dbt model run tasks. This issue affects our on-call personnel because they have to manually clear and re-run these tasks in order for the model to successfully run (the re-run usually succeeds). Some model runs must be manual re-run in order for this task to succeed and some may recover on their own.

Relevant log output

Here are four examples of errors in log output for different missing /tmp/ files: 

(1)
[2024-06-24, 18:32:45 UTC] {subprocess.py:94} INFO - FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp4yx6m8en/package-lock.yml'

(2) This is a typical example for one of our models: 
"FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp2pcqnmxp/models/frontroom/business_models/provider_pay/datamarts/.provider_performance.sql.JjagaL'", ''); 274)

(3) This is a typical example for one of our models:
2024-06-21, 10:17:35 UTC] {log.py:232} WARNING - [2024-06-21T10:17:35.702+0000] {subprocess.py:94} INFO - (astronomer-cosmos) - FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmph56xe15q/models/frontroom/business_models/datamarts/honest_heatlh/.honest_health_monthly_subscription_snapshots.sql.KDCDEl'
[2024-06-21, 10:17:35 UTC] {subprocess.py:94} INFO - FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmph56xe15q/models/frontroom/business_models/datamarts/honest_heatlh/.honest_health_monthly_subscription_snapshots.sql.KDCDEl'

(4) This example is for external models we use from the dbt qualtrics package: 
[Errno 2] No such file or directory: '/tmp/tmpf6h80niz/models/intermediate/shipment/.int_easy_post_tracking.sql.GgjJgL'
06:02:42  Traceback (most recent call last):
  File "/usr/local/airflow/dbt_venv/lib/python3.11/site-packages/dbt/cli/requires.py", line 91, in wrapper
    result, success = func(*args, **kwargs)
                      ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/dbt_venv/lib/python3.11/site-packages/dbt/cli/requires.py", line 76, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/dbt_venv/lib/python3.11/site-packages/dbt/cli/requires.py", line 169, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/dbt_venv/lib/python3.11/site-packages/dbt/cli/requires.py", line 198, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/dbt_venv/lib/python3.11/site-packages/dbt/cli/requires.py", line 245, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/dbt_venv/lib/python3.11/site-packages/dbt/cli/requires.py", line 271, in wrapper
    ctx.obj["manifest"] = parse_manifest(
                          ^^^^^^^^^^^^^^^
  File "/usr/local/airflow/dbt_venv/lib/python3.11/site-packages/dbt/parser/manifest.py", line 1790, in parse_manifest
    manifest = ManifestLoader.get_full_manifest(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/dbt_venv/lib/python3.11/site-packages/dbt/parser/manifest.py", line 318, in get_full_manifest
    manifest = loader.load()
               ^^^^^^^^^^^^^
  File "/usr/local/airflow/dbt_venv/lib/python3.11/site-packages/dbt/parser/manifest.py", line 365, in load
    file_reader.read_files()
  File "/usr/local/airflow/dbt_venv/lib/python3.11/site-packages/dbt/parser/read_files.py", line 203, in read_files
    self.read_files_for_project(project, file_types)
  File "/usr/local/airflow/dbt_venv/lib/python3.11/site-packages/dbt/parser/read_files.py", line 210, in read_files_for_project
    project_files[file_type_info["parser"]] = read_files_for_parser(
                                              ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/dbt_venv/lib/python3.11/site-packages/dbt/parser/read_files.py", line 158, in read_files_for_parser
    source_files = get_source_files(
                   ^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/dbt_venv/lib/python3.11/site-packages/dbt/parser/read_files.py", line 134, in get_source_files
    fp_list = filesystem_search(project, paths, extension, ignore_spec)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/dbt_venv/lib/python3.11/site-packages/dbt/parser/search.py", line 74, in filesystem_search
    for result in find_matching(root, relative_dirs, ext, ignore_spec):
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/dbt_venv/lib/python3.11/site-packages/dbt/clients/system.py", line 79, in find_matching
    modification_time = os.path.getmtime(absolute_path)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen genericpath>", line 55, in getmtime
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpf6h80niz/models/intermediate/shipment/.int_easy_post_tracking.sql.GgjJgL'

How to reproduce

I believe having our setup should reproduce it as the issue happens quite often (once every few hours). We also have 1000+ dbt models so an example project of that size may be needed for reproducibility. If underlying deployment infrastructure information is required here it is:

image image

Anything else :)?

No response

Are you willing to submit PR?

Contact Details

oma@forhims.com

dosubot[bot] commented 1 week ago

To resolve the intermittent FileNotFoundError: [Errno 2] No such file or directory for /tmp files that is causing task failures for dbt Cosmos tasks in your Astronomer deployment, you should ensure that the temporary directories and files are properly managed and exist when needed.

Here are some steps to address this issue:

  1. Manage Temporary Directories Properly: Ensure that the temporary directories are not cleaned up prematurely. Here is a revised version of the function to ensure the temporary directory is properly managed:

    import tempfile
    import shutil
    from pathlib import Path
    
    def failing_test_dbt_project(tmp_path):
       with tempfile.TemporaryDirectory() as tmp_dir:
           tmp_dir_path = Path(tmp_dir) / "mini"
           shutil.copytree(MINI_DBT_PROJ_DIR, tmp_dir_path)
           target_schema = tmp_dir_path / "models/schema.yml"
           if target_schema.exists():
               os.remove(target_schema)
           shutil.copy(MINI_DBT_PROJ_DIR_FAILING_SCHEMA, target_schema)
           yield tmp_dir_path
  2. Set the TMPDIR Environment Variable: Ensure that the TMPDIR environment variable is set correctly, especially on MacOS, to avoid the value of the temp directory changing unexpectedly. You can configure this in your cosmos/settings.py:

    import os
    import tempfile
    from pathlib import Path
    
    import airflow
    from airflow.configuration import conf
    
    from cosmos.constants import DEFAULT_COSMOS_CACHE_DIR_NAME, DEFAULT_OPENLINEAGE_NAMESPACE
    
    # In MacOS users may want to set the envvar `TMPDIR` if they do not want the value of the temp directory to change
    DEFAULT_CACHE_DIR = Path(tempfile.gettempdir(), DEFAULT_COSMOS_CACHE_DIR_NAME)
    cache_dir = Path(conf.get("cosmos", "cache_dir", fallback=DEFAULT_CACHE_DIR) or DEFAULT_CACHE_DIR)
    enable_cache = conf.getboolean("cosmos", "enable_cache", fallback=True)
    enable_cache_partial_parse = conf.getboolean("cosmos", "enable_cache_partial_parse", fallback=True)
    enable_cache_dbt_ls = conf.getboolean("cosmos", "enable_cache_dbt_ls", fallback=True)
    propagate_logs = conf.getboolean("cosmos", "propagate_logs", fallback=True)
    dbt_docs_dir = conf.get("cosmos", "dbt_docs_dir", fallback=None)
    dbt_docs_conn_id = conf.get("cosmos", "dbt_docs_conn_id", fallback=None)
    dbt_docs_index_file_name = conf.get("cosmos", "dbt_docs_index_file_name", fallback="index.html")
    enable_cache_profile = conf.getboolean("cosmos", "enable_cache_profile", fallback=True)
    dbt_profile_cache_dir_name = conf.get("cosmos", "profile_cache_dir_name", fallback="profile")
    
    try:
       LINEAGE_NAMESPACE = conf.get("openlineage", "namespace")
    except airflow.exceptions.AirflowConfigException:
       LINEAGE_NAMESPACE = os.getenv("OPENLINEAGE_NAMESPACE", DEFAULT_OPENLINEAGE_NAMESPACE)
  3. Handle Temporary Directory in run_command Function: Ensure that the run_command function in cosmos/dbt/graph.py is correctly handling the temporary directory and environment variables:

    def run_command(command: list[str], tmp_dir: Path, env_vars: dict[str, str]) -> str:
       """Run a command in a subprocess, returning the stdout."""
       logger.info("Running command: `%s`", " ".join(command))
       logger.debug("Environment variable keys: %s", env_vars.keys())
       process = Popen(
           command,
           stdout=PIPE,
           stderr=PIPE,
           cwd=tmp_dir,
           universal_newlines=True,
           env=env_vars,
       )
       stdout, stderr = process.communicate()
       returncode = process.returncode
    
       if 'Run "dbt deps" to install package dependencies' in stdout and command[1] == "ls":
           raise CosmosLoadDbtException(
               "Unable to run dbt ls command due to missing dbt_packages. Set RenderConfig.dbt_deps=True."
           )
    
       if returncode or "Error" in stdout.replace("WarnErrorOptions", ""):
           details = stderr or stdout
           raise CosmosLoadDbtException(f"Unable to run {command} due to the error:\n{details}")
    
       return stdout

By ensuring the temporary directory is properly managed, configuring the TMPDIR environment variable, and correctly handling the temporary directory in the run_command function, you can mitigate the FileNotFoundError issue [1][2][3].

To continue talking to Dosu, mention @dosu.