astronomer / astronomer-cosmos

Run your dbt Core projects as Apache Airflow DAGs and Task Groups with a few lines of code
https://astronomer.github.io/astronomer-cosmos/
Apache License 2.0
476 stars 123 forks source link

performance degradation #932

Open liranc1 opened 1 month ago

liranc1 commented 1 month ago

Before using cosmos Airflow dag was running for about 15 min for a certain dbt command. After the change to cosmos, the same dbt command is much more volatile, often taking 20-30 min. All Airflow's resources stayed the same for the dag, and there was no change in the dbt connection details.

I also encountered some tasks randomly failing due to connection error (snowflake), that was successful on the next run. This issue did not occur without cosmos.

cosmos configurations used:

ExecutionConfig(dbt_executable_path=DBT_EXECUTABLE_PATH)

RenderConfig( select=["models"], test_behavior=TestBehavior.NONE, load_method=LoadMode.DBT_LS, dbt_deps=False )

ProjectConfig(os.environ["DBT_PROJECT_PATH"], dbt_vars=dbt_vars)

dbt version: Core:

dosubot[bot] commented 1 month ago

To address the performance degradation and intermittent connection errors with your Airflow DAGs using cosmos for dbt tasks, consider the following targeted steps:

  1. Optimize Dependency Installation: If dependencies are installed with each run, this can slow down execution. After the initial run, set install_deps to False to reuse the environment, as shown in the cosmos_manifest_example.py.

  2. Parallel Model Execution: Ensure your Airflow configuration supports parallel task execution to fully utilize dbt's parallel model execution capabilities.

  3. Resource Allocation: Check that your Airflow and dbt tasks have sufficient memory and CPU resources.

  4. Data Volume and Complexity: For large or complex data transformations, optimize your dbt models or break down tasks into smaller segments.

  5. Network Latency: If Airflow and the database are not in the same network, reduce network latency by optimizing your network configuration.

For the Snowflake connection issues:

These steps are derived from analyzing the provided configuration and issues. Implementing these recommendations should help mitigate the performance and connectivity issues you're experiencing.

#### Details **Sources** - [How can I Improve on performance](https://github.com/astronomer/astronomer-cosmos/issues/840) - [cosmos/profiles/snowflake/user_pass.py](https://github.com/astronomer/astronomer-cosmos/blob/main/cosmos/profiles/snowflake/user_pass.py) - [cosmos/profiles/snowflake/__init__.py](https://github.com/astronomer/astronomer-cosmos/blob/main/cosmos/profiles/snowflake/__init__.py) - [cosmos/profiles/snowflake/user_encrypted_privatekey_env_variable.py](https://github.com/astronomer/astronomer-cosmos/blob/main/cosmos/profiles/snowflake/user_encrypted_privatekey_env_variable.py) - [dev/dags/performance_dag.py](https://github.com/astronomer/astronomer-cosmos/blob/main/dev/dags/performance_dag.py) - [cosmos/dbt/graph.py](https://github.com/astronomer/astronomer-cosmos/blob/main/cosmos/dbt/graph.py) - [cosmos/config.py](https://github.com/astronomer/astronomer-cosmos/blob/main/cosmos/config.py) - [cosmos/operators/local.py](https://github.com/astronomer/astronomer-cosmos/blob/main/cosmos/operators/local.py) - [cosmos/operators/kubernetes.py](https://github.com/astronomer/astronomer-cosmos/blob/main/cosmos/operators/kubernetes.py)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

tatiana commented 1 month ago

Hi @liranc1 thanks for sharing this information, could you please confirm which version of Cosmos you were using before and after this performance degradation was observed?

liranc1 commented 1 month ago

Cosmos was not used before, I was using bash operator. The performance degradation started once I moved to Cosmos. The cosmos version I used is 1.3.2

tatiana commented 2 weeks ago

@liranc1 could you try out Cosmos 1.4 and let us know if there are any performance improvements?