Open liranc1 opened 1 month ago
To address the performance degradation and intermittent connection errors with your Airflow DAGs using cosmos for dbt tasks, consider the following targeted steps:
Optimize Dependency Installation: If dependencies are installed with each run, this can slow down execution. After the initial run, set install_deps
to False
to reuse the environment, as shown in the cosmos_manifest_example.py.
Parallel Model Execution: Ensure your Airflow configuration supports parallel task execution to fully utilize dbt's parallel model execution capabilities.
Resource Allocation: Check that your Airflow and dbt tasks have sufficient memory and CPU resources.
Data Volume and Complexity: For large or complex data transformations, optimize your dbt models or break down tasks into smaller segments.
Network Latency: If Airflow and the database are not in the same network, reduce network latency by optimizing your network configuration.
For the Snowflake connection issues:
private_key
and private_key_passphrase
.These steps are derived from analyzing the provided configuration and issues. Implementing these recommendations should help mitigate the performance and connectivity issues you're experiencing.
To continue the conversation, mention @dosu.
Hi @liranc1 thanks for sharing this information, could you please confirm which version of Cosmos you were using before and after this performance degradation was observed?
Cosmos was not used before, I was using bash operator. The performance degradation started once I moved to Cosmos. The cosmos version I used is 1.3.2
@liranc1 could you try out Cosmos 1.4 and let us know if there are any performance improvements?
Before using cosmos Airflow dag was running for about 15 min for a certain dbt command. After the change to cosmos, the same dbt command is much more volatile, often taking 20-30 min. All Airflow's resources stayed the same for the dag, and there was no change in the dbt connection details.
I also encountered some tasks randomly failing due to connection error (snowflake), that was successful on the next run. This issue did not occur without cosmos.
cosmos configurations used:
ExecutionConfig(dbt_executable_path=DBT_EXECUTABLE_PATH)
RenderConfig( select=["models"], test_behavior=TestBehavior.NONE, load_method=LoadMode.DBT_LS, dbt_deps=False )
ProjectConfig(os.environ["DBT_PROJECT_PATH"], dbt_vars=dbt_vars)
dbt version: Core: