astronomer / astronomer-cosmos

Run your dbt Core projects as Apache Airflow DAGs and Task Groups with a few lines of code
https://astronomer.github.io/astronomer-cosmos/
Apache License 2.0
532 stars 136 forks source link

Compiled sql not being written in target path of dbt project directory #851

Open EugenioG2021 opened 5 months ago

EugenioG2021 commented 5 months ago

I have ran a model in airflow, and it says the compiled sql can be found at some target/{some_subdirectory} path. However, I am not seeing any "target" directory created in neither of these places:

1.In the project directory (dbt_project_path argument of ProjectConfig on my DAG python file)

  1. In airflow's home directory, nor inside the dags directory
  2. In the directory I specified on the ExecutionConfig with the dbt_executable_path

This is my DbtTaskGroup which I use in the airflow DAG :

SNOWFLAKE_CONN_ID='some_connection_to_snowflake'
DBT_SNOWFLAKE_SCHEMA='some_schema'

   profile = ProfileConfig(
    profile_name="default",
    target_name="dev",
    profile_mapping=SnowflakeUserPasswordProfileMapping(
        conn_id=SNOWFLAKE_CONN_ID, profile_args={"schema": DBT_SNOWFLAKE_SCHEMA}
    ),
)

   dbt_tg = DbtTaskGroup(
        group_id='whatever',
        project_config=ProjectConfig(
            dbt_project_path=f"/usr/local/airflow/dags/dbt/data_eng_dbt",
            seeds_relative_path=f"seeds/",
        ),
        execution_config=ExecutionConfig(
            dbt_executable_path=f"/usr/local/airflow/dbt_venv/bin/dbt",
        ),
        render_config=RenderConfig(
            load_method=LoadMode.DBT_LS,
            select=[],
            exclude=exclude_list,
            test_behavior=TestBehavior.NONE,
            emit_datasets=False,
            dbt_deps=False,
        ),
        profile_config=profile
    )

On the other hand, my dbt_project.yml is as follows:

# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'data_eng_dbt'
version: '1.0.0'
config-version: 2

# This setting configures which "profile" dbt uses for this project.
profile: 'default'

# These configurations specify where dbt should look for different types of files.
# The `source-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

target-path: "target" 

My project scafolding starting at airflow's Directory is:

dags/
|
|
 --- dbt/
   |
   |
     -- data_eng_dbt/
         |
         |
           -- dbt_project.yml
           -- macros/
           -- seeds/
           -- dbt_packages/
           -- profiles.yml
           -- packages.yml
           -- tests/
           -- models/
                |
                |
                    -- intermediate/
                          |
                          |
                             -- int_unite_factevents_pstage.sql

When airflow dag's run and I had an error on my sql of a model called intermediate.int_unite_factevents_pstage, it says my compiled sql should be at compiled Code at target/run/data_eng_dbt/models/intermediate/placement_id_match/int_unite_factevents_pstage.sql'

However, I cannot find that target/run directory anywhere, where should it be? And does the dbt_project.yml comes into play here? Because I did specified the target-path there.

I also have no system environment variables for dbt_target as mentioned here and I run that DbtTaskGroup instance by just putting it inside a standard airflow dag

@dag(
    start_date=datetime(2022, 2, 10),
    schedule_interval=None,
    catchup=False,
    default_view="graph",
    doc_md=__doc__,
    dag_id='dag_test',
    default_args=default_args,
    render_template_as_native_obj=True,
)
def dag_test():
   {..code shown above that generates dbt_tg instance of DbtTaskGroup...}
   (dbt_tg)
dag=dag_test
dosubot[bot] commented 5 months ago

🤖

Based on the information you've provided and the context from the Cosmos repository, the target/run directory, which contains the compiled SQL, should typically be located within your dbt project directory. In your case, it should be within the data_eng_dbt directory, as specified in your ProjectConfig.

However, if the target/run directory is not found, it could be due to a few reasons:

  1. The dbt project has not been run yet. The target/run directory is created when a dbt project is run. If the project has not been run, the directory will not exist.

  2. The dbt project directory is not correctly specified. In your case, the dbt project directory is specified as /usr/local/airflow/dags/dbt/data_eng_dbt. Make sure that this path correctly points to your dbt project directory.

  3. The dbt project directory does not have the correct permissions. Make sure that the user running the Airflow DAG has read and write permissions to the dbt project directory.

If none of these solutions work, you may need to manually create the target/run directory within your dbt project directory.

Regarding your question about how Cosmos handles the target-path configuration from the dbt_project.yml file, I wasn't able to find a specific answer within the repository. It's possible that this information is available elsewhere or I may have missed it.

If you have any additional information or context that could help me better understand your issue, please provide it. I'm here to help!

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
tatiana commented 2 months ago

@EugenioG2021 Cosmos currently creates temporary directories to export the target_directory. This was done intentionally since other users faced issues with read-only file systems. That said, I agree that we should allow users to specify the target directory if they want to control this.

Would you be interested in contributing this feature?