databricks / dbt-databricks

A dbt adapter for Databricks.
https://databricks.com
Apache License 2.0
211 stars 112 forks source link

High memory use when token is invalid #388

Closed jeremyyeo closed 11 months ago

jeremyyeo commented 1 year ago

Describe the bug

When the token is incorrect, there is a memory spike. Cause looks to be the databricks-sql-python lib (https://github.com/databricks/databricks-sql-python/issues/179).

Steps To Reproduce

  1. Setup env.
python -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install memory_profiler matplotlib dbt-databricks~=1.5.0
  1. Setup dbt project.
# ~/.dbt/profiles.yml
databricks:
  target: dev
  outputs:
    dev:
      type: databricks
      schema: dbt_jyeo
      host: <yourhost>.cloud.databricks.com
      token: <yourtoken>
      http_path: /sql/1.0/endpoints/<yourendpoint>

# root/dbt_project.yml
name: "my_dbt_project"
version: "1.0.0"
config-version: 2
profile: "databricks"
models:
  my_dbt_project:
    +materialized: table
    +file_format: delta
-- root/models/foo.sql
select 1 id

Using a token that's valid, run:

$ mprof run --output normal.dat dbt run
mprof: Sampling memory every 0.1s
running new process
08:53:12  Running with dbt=1.5.2
08:53:14  Registered adapter: databricks=1.5.5
08:53:16  Found 1 model, 0 tests, 0 snapshots, 0 analyses, 415 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
08:53:16  
08:53:17  databricks-sql-connector adapter: Successfully opened session 01ee247f-5beb-1fbc-b01f-177f69f758a7
08:53:18  databricks-sql-connector adapter: Closing session 01ee247f-5beb-1fbc-b01f-177f69f758a7
08:53:19  databricks-sql-connector adapter: Successfully opened session 01ee247f-5d31-14ca-ae88-3848529a54ce
08:53:20  databricks-sql-connector adapter: Closing session 01ee247f-5d31-14ca-ae88-3848529a54ce
08:53:21  Concurrency: 1 threads (target='dev')
08:53:21  
08:53:21  1 of 1 START sql table model dbt_jyeo.foo ...................................... [RUN]
08:53:22  databricks-sql-connector adapter: Successfully opened session 01ee247f-5ecc-1f87-86e0-add48116142d
08:53:27  databricks-sql-connector adapter: Closing session 01ee247f-5ecc-1f87-86e0-add48116142d
08:53:27  1 of 1 OK created sql table model dbt_jyeo.foo ................................. [OK in 6.81s]
08:53:29  databricks-sql-connector adapter: Successfully opened session 01ee247f-62e4-17ff-aff2-f79efa639e71
08:53:29  databricks-sql-connector adapter: Closing session 01ee247f-62e4-17ff-aff2-f79efa639e71
08:53:29  
08:53:29  Finished running 1 table model in 0 hours 0 minutes and 13.69 seconds (13.69s).
08:53:29  
08:53:29  Completed successfully
08:53:29  
08:53:29  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
$ mprof plot normal.dat

image

Now, edit the token so that it is invalid (like removing some characters from it), run:

$ mprof run --output error.dat dbt run
mprof: Sampling memory every 0.1s
running new process
08:56:02  Running with dbt=1.5.2
08:56:03  Registered adapter: databricks=1.5.5
08:56:04  Found 1 model, 0 tests, 0 snapshots, 0 analyses, 415 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
08:56:04  
08:56:07  databricks-sql-connector adapter: Error during request to server: : Invalid access token.: {"method": "OpenSession", "session-id": null, "query-id": null, "http-code": 403, "error-message": ": Invalid access token.", "original-exception": "", "no-retry-reason": "non-retryable error", "bounded-retry-delay": null, "attempt": "1/30", "elapsed-seconds": "2.977200984954834/900.0"}
08:56:07  
08:56:07  Finished running  in 0 hours 0 minutes and 3.03 seconds (3.03s).
08:56:07  Encountered an error:
Runtime Error
  Runtime Error
    Database Error
      Error during request to server: : Invalid access token.
$ mprof plot error.dat

image

Expected behavior

Memory not to spike to 2 GB for some reason?

Screenshots and log output

Debug logs on errored run - doesn't seem useful:

============================== 20:56:02.087081 | 6799eb4c-5179-42b0-9256-e94870531cb1 ==============================
20:56:02.087081 [info ] [MainThread]: Running with dbt=1.5.2
20:56:02.088240 [debug] [MainThread]: running dbt with arguments {'printer_width': '80', 'indirect_selection': 'eager', 'write_json': 'True', 'log_cache_events': 'False', 'partial_parse': 'False', 'cache_selected_only': 'False', 'profiles_dir': '/Users/jeremy/.dbt', 'version_check': 'True', 'fail_fast': 'False', 'log_path': '/Users/jeremy/src/dbt-sandcastles/dbt/databricks/logs', 'debug': 'False', 'warn_error': 'None', 'use_colors': 'True', 'use_experimental_parser': 'False', 'no_print': 'None', 'quiet': 'False', 'log_format': 'default', 'introspect': 'True', 'static_parser': 'True', 'warn_error_options': 'WarnErrorOptions(include=[], exclude=[])', 'target_path': 'None', 'send_anonymous_usage_stats': 'True'}
20:56:03.640508 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'project_id', 'label': '6799eb4c-5179-42b0-9256-e94870531cb1', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x109687850>]}
20:56:03.667248 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'adapter_info', 'label': '6799eb4c-5179-42b0-9256-e94870531cb1', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11b065b50>]}
20:56:03.668028 [info ] [MainThread]: Registered adapter: databricks=1.5.5
20:56:03.710387 [debug] [MainThread]: checksum: 2435fa7be9fcfa6ffe15da8f202ddc09867bf7d6308330064299ccb85c8646b5, vars: {}, profile: , target: , version: 1.5.2
20:56:03.712012 [debug] [MainThread]: Partial parsing not enabled
20:56:04.861308 [debug] [MainThread]: 1699: static parser successfully parsed foo.sql
20:56:04.925382 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'load_project', 'label': '6799eb4c-5179-42b0-9256-e94870531cb1', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11b2620d0>]}
20:56:04.935211 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'resource_counts', 'label': '6799eb4c-5179-42b0-9256-e94870531cb1', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11b27da30>]}
20:56:04.936045 [info ] [MainThread]: Found 1 model, 0 tests, 0 snapshots, 0 analyses, 415 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
20:56:04.936798 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': '6799eb4c-5179-42b0-9256-e94870531cb1', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11b065dc0>]}
20:56:04.939568 [info ] [MainThread]: 
20:56:04.941355 [debug] [MainThread]: Acquiring new databricks connection 'master'
20:56:04.943119 [debug] [ThreadPool]: Acquiring new databricks connection 'list_schemas'
20:56:04.956693 [debug] [ThreadPool]: Using databricks connection "list_schemas"
20:56:04.957372 [debug] [ThreadPool]: On list_schemas: /* {"app": "dbt", "dbt_version": "1.5.2", "dbt_databricks_version": "1.5.5", "databricks_sql_connector_version": "2.7.0", "profile_name": "databricks", "target_name": "dev", "connection_name": "list_schemas"} */

    show databases

20:56:04.957811 [debug] [ThreadPool]: Opening a new connection, currently in state init
20:56:07.954693 [info ] [ThreadPool]: databricks-sql-connector adapter: Error during request to server: : Invalid access token.: {"method": "OpenSession", "session-id": null, "query-id": null, "http-code": 403, "error-message": ": Invalid access token.", "original-exception": "", "no-retry-reason": "non-retryable error", "bounded-retry-delay": null, "attempt": "1/30", "elapsed-seconds": "2.977200984954834/900.0"}
20:56:07.956484 [debug] [ThreadPool]: Databricks adapter: <class 'databricks.sql.exc.RequestError'>: Error during request to server: : Invalid access token.
20:56:07.957100 [debug] [ThreadPool]: Databricks adapter: attempt: 1/30
20:56:07.957594 [debug] [ThreadPool]: Databricks adapter: bounded-retry-delay: None
20:56:07.957989 [debug] [ThreadPool]: Databricks adapter: elapsed-seconds: 2.977200984954834/900.0
20:56:07.958367 [debug] [ThreadPool]: Databricks adapter: error-message: : Invalid access token.
20:56:07.958745 [debug] [ThreadPool]: Databricks adapter: http-code: 403
20:56:07.959208 [debug] [ThreadPool]: Databricks adapter: method: OpenSession
20:56:07.959709 [debug] [ThreadPool]: Databricks adapter: no-retry-reason: non-retryable error
20:56:07.960121 [debug] [ThreadPool]: Databricks adapter: original-exception: 
20:56:07.960470 [debug] [ThreadPool]: Databricks adapter: query-id: None
20:56:07.960802 [debug] [ThreadPool]: Databricks adapter: session-id: None
20:56:07.961284 [debug] [ThreadPool]: Databricks adapter: Error while running:
/* {"app": "dbt", "dbt_version": "1.5.2", "dbt_databricks_version": "1.5.5", "databricks_sql_connector_version": "2.7.0", "profile_name": "databricks", "target_name": "dev", "connection_name": "list_schemas"} */

    show databases

20:56:07.961992 [debug] [ThreadPool]: Databricks adapter: Database Error
  Error during request to server: : Invalid access token.
20:56:07.963055 [debug] [ThreadPool]: Databricks adapter: Error while running:
macro list_schemas
20:56:07.963919 [debug] [ThreadPool]: Databricks adapter: Runtime Error
  Database Error
    Error during request to server: : Invalid access token.
20:56:07.964797 [debug] [ThreadPool]: On list_schemas: No close available on handle
20:56:07.966655 [debug] [MainThread]: Connection 'master' was properly closed.
20:56:07.967149 [debug] [MainThread]: Connection 'list_schemas' was properly closed.
20:56:07.967594 [info ] [MainThread]: 
20:56:07.968599 [info ] [MainThread]: Finished running  in 0 hours 0 minutes and 3.03 seconds (3.03s).
20:56:07.970006 [error] [MainThread]: Encountered an error:
Runtime Error
  Runtime Error
    Database Error
      Error during request to server: : Invalid access token.
20:56:07.971475 [debug] [MainThread]: Command `dbt run` failed at 20:56:07.971265 after 5.93 seconds
20:56:07.972068 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10856a220>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11b065dc0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11b262d30>]}
20:56:07.972573 [debug] [MainThread]: Flushing usage events

System information

The output of dbt --version:

Core:
  - installed: 1.5.2
  - latest:    1.5.2 - Up to date!

Plugins:
  - databricks: 1.5.5 - Up to date!
  - spark:      1.5.0 - Up to date!

The operating system you're using:

macOS
2 GHz Quad-Core Intel Core i5
16 GB 3733 MHz LPDDR4X

The output of python --version:

Python 3.9.13

Additional context

Couple of other version tests using a bad token:

dbt-databricks 1.4

image

dbt-spark 1.5 (ODBC)

image

Using dbt-spark, the memory consumption seems more reasonable and doesn't spike:

dbt-spark adapter tested using the following profile:

# ~ /.dbt/profiles.yml
spark:
  target: dev
  outputs:
    dev:
      type: spark
      schema: dbt_jyeo
      method: odbc
      driver: /Library/simba/spark/lib/libsparkodbc_sbu.dylib
      host: <yourhost>.cloud.databricks.com
      endpoint: <yourendpoint>
      token: <yourtoken> # make this invalid
      port: 443
      connect_timeout: 10

Did some previous memory usage testing previously: https://github.com/jeremyyeo/dbt-performance/blob/main/202304_dbt-spark-databricks-memory-usage/stats.ipynb - dbt-databricks adapter does consume more memory than dbt-spark typically - but this scenario (token error) seems to be much worse.

benc-db commented 1 year ago

@susodapop any insight here?

susodapop commented 1 year ago

I suspect this is an issue with dbt-core or an upstream change in dbt-spark creating the memory spike. In April 2023 we investigated what appeared to be a memory leak when running Databricks jobs in dbt cloud that exhibited a similar spike just as the dbt command concluded.

image

I don't believe the leak comes from databricks-sql-connector since we manually injected mprof statements into the connector's source code to see if the memory spiked within the connector or within dbt and the result was dbt.

This research resulted in this PR to dbt-core which fixed the OOM at the time. https://github.com/dbt-labs/dbt-core/pull/7371

For this issue, we need to see if we can

  1. Reproduce this issue with dbt-databricks
  2. Reproduce it with databricks-sql-connector
benc-db commented 11 months ago

@jeremyyeo if this issue still reproduces for you, can you open the bug on the sql connector if you haven't already?

susodapop commented 11 months ago

@benc-db There is already an open issue on databricks-sql-connector for this here: https://github.com/databricks/databricks-sql-python/issues/179

But we can't reproduce it. Like I mentioned in my comment above, this appears to be an issue in either dbt-core or dbt-spark. It could be in our connector but without reproduction steps we can't do RCA or validate a fix.