Open michal-mrazek opened 3 months ago
Hello, @michal-mrazek! I'm here to help you with any bugs, questions, or contributions you have. I'll assist you while we wait for a human maintainer to review your issue.
High CPU and memory usage when running a Cosmos DAG with dbt models on a small MWAA cluster can be expected behavior, especially if you are dealing with a large dbt project. The documentation notes that larger dbt projects might cause a "DagBag import timeout" error, which suggests that resource-intensive operations are anticipated. Increasing the core.dagbag_import_timeout
value in the Airflow configuration is recommended to handle such cases [1].
Additionally, Cosmos task groups are inherently more resource-intensive compared to the BashOperator
due to the complexity and additional functionalities provided by Cosmos. These include managing environment variables, handling multiple dbt-specific flags and options, and executing dbt commands with various configurations and caching mechanisms [2][3][4][5].
To optimize performance and reduce resource usage, consider the following:
Partial Parsing: Ensure that you are leveraging dbt's partial parsing to speed up task execution and DAG parsing. This requires consistent use of the same --vars
, --target
, --profile
, and profile.yml
environment variables [6].
Profile Configuration: Disable mocked profile mappings by setting render_config=RenderConfig(enable_mock_profile=False)
if using Cosmos profile mapping. Declare your own profiles.yml
file via ProfileConfig(profiles_yml_filepath=...)
[6].
Caching: Cosmos caches the most up-to-date partial_parse.msgpack
file after running a dbt command in the system temporary directory. This reduces the need for full dbt project parsing on each run. Caching is enabled by default but can be configured or disabled using the following settings [6]:
[cosmos]
cache_dir = path/to/docs/here # to override default caching directory (by default, uses the system temporary directory)
enable_cache = False # to disable caching (enabled by default)
Or using environment variables:
AIRFLOW__COSMOS__CACHE_DIR="path/to/docs/here" # to override default caching directory (by default, uses the system temporary directory)
AIRFLOW__COSMOS__ENABLE_CACHE="False" # to disable caching (enabled by default)
Monitoring specific logs and metrics can also help diagnose high resource usage. The cosmos/converter.py
file includes logging statements that provide performance metrics for parsing the dbt project and building the Airflow DAG [3].
By following these optimizations and configurations, you can improve the performance and reduce the CPU and memory usage when using Cosmos with dbt models.
To continue talking to Dosu, mention @dosu.
We did some performance improvement in 1.5 would it be possible to upgrade and test?
@tatiana has discussed some best practice strategies in a webinar you might be interested to check https://www.astronomer.io/events/webinars/best-practices-effectively-running-dbt-airflow-video/
We tested upgrade to 1.5.1 but unfortunately, resource utilization did not decrease much.
This issue is stale because it has been open for 30 days with no activity.
Astronomer Cosmos Version
Other Astronomer Cosmos version (please specify below)
If "Other Astronomer Cosmos version" selected, which one?
1.4.0
dbt-core version
1.8.2
Versions of dbt adapters
dbt-snowflake==1.8.2
LoadMode
CUSTOM
ExecutionMode
VIRTUALENV
InvocationMode
DBT_RUNNER
airflow version
2.8.1
Operating System
Amazon Linux AMI
If a you think it's an UI issue, what browsers are you seeing the problem on?
No response
Deployment
Amazon (AWS) MWAA
Deployment details
No response
What happened?
Hello! We are running a cosmos in AWS MWAA instance, we have several dbt projects and we are observing that the CPU and memory are high when cosmos dag starts.
We tested a DBT project with about 40 models to compare resource utilization between BashOperator and Cosmos task groups (with max_active_tasks=10). On a small MWAA cluster, the BashOperator was performing just fine. However, cosmos was struggling, with CPU and memory peaks at 100% and random failures due to that.
So I wanted to ask, can there be anything that we are doing wrongly or is this expected behavior of cosmos? In my mind, the operation on the airflow side should not be that heavy. We also tested switching to local execution mode but we did not observe a significant difference.
Relevant log output
No response
How to reproduce
Run a dbt project with 40 models in a small MWAA cluster.
Anything else :)?
No response
Are you willing to submit PR?
Contact Details
mmrazek@paylocity.com