Closed jeremyyeo closed 11 months ago
@susodapop any insight here?
I suspect this is an issue with dbt-core or an upstream change in dbt-spark creating the memory spike. In April 2023 we investigated what appeared to be a memory leak when running Databricks jobs in dbt cloud that exhibited a similar spike just as the dbt
command concluded.
I don't believe the leak comes from databricks-sql-connector since we manually injected mprof
statements into the connector's source code to see if the memory spiked within the connector or within dbt and the result was dbt.
This research resulted in this PR to dbt-core which fixed the OOM at the time. https://github.com/dbt-labs/dbt-core/pull/7371
For this issue, we need to see if we can
@jeremyyeo if this issue still reproduces for you, can you open the bug on the sql connector if you haven't already?
@benc-db There is already an open issue on databricks-sql-connector for this here: https://github.com/databricks/databricks-sql-python/issues/179
But we can't reproduce it. Like I mentioned in my comment above, this appears to be an issue in either dbt-core or dbt-spark. It could be in our connector but without reproduction steps we can't do RCA or validate a fix.
Describe the bug
When the token is incorrect, there is a memory spike. Cause looks to be the
databricks-sql-python
lib (https://github.com/databricks/databricks-sql-python/issues/179).Steps To Reproduce
Using a
token
that's valid, run:Now, edit the
token
so that it is invalid (like removing some characters from it), run:Expected behavior
Memory not to spike to 2 GB for some reason?
Screenshots and log output
Debug logs on errored run - doesn't seem useful:
System information
The output of
dbt --version
:The operating system you're using:
The output of
python --version
:Additional context
Couple of other version tests using a bad
token
:dbt-databricks 1.4
dbt-spark 1.5 (ODBC)
Using dbt-spark, the memory consumption seems more reasonable and doesn't spike:
dbt-spark adapter tested using the following profile:
Did some previous memory usage testing previously: https://github.com/jeremyyeo/dbt-performance/blob/main/202304_dbt-spark-databricks-memory-usage/stats.ipynb - dbt-databricks adapter does consume more memory than dbt-spark typically - but this scenario (token error) seems to be much worse.