databricks / dbt-databricks

A dbt adapter for Databricks.
https://databricks.com
Apache License 2.0
226 stars 119 forks source link

Idempotency is not ensured in the streaming table process #710

Closed case-k-git closed 4 months ago

case-k-git commented 5 months ago

Describe the bug

A clear and concise description of what the bug is. What command did you run? What happened?

Running the following model more than twice causes dbt run to fail. The refresh process is supposed to be called, but the process fails before that.

SQL

{{
   config(
     materialized='streaming_table'
   )
}}

select
 *
 ,_metadata.file_path as file_path
from stream read_files('s3://path/to/your/data/',
format => 'parquet',header => true)

ERROR

% dbt run
01:49:41  Running with dbt=1.8.1
01:49:41  Registered adapter: databricks=1.8.1
01:49:42  Found 1 model, 1 source, 586 macros
01:49:42  
01:49:43  Concurrency: 1 threads (target='dev')
01:49:43  
01:49:43  1 of 1 START sql streaming_table model schema_demo.daily_model_load_st_2 ....... [RUN]
01:49:45  Unhandled error while executing 
HTTPSConnectionPool(host='hoge.cloud.databricks.com', port=443): Max retries exceeded with url: /api/2.0/pipelines/8cd5b015-2315-48f8-991c-537215d5c989 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1006)')))
01:49:45  1 of 1 ERROR creating sql streaming_table model schema_demo.daily_model_load_st_2  [ERROR in 1.61s]
01:49:46  
01:49:46  Finished running 1 streaming table model in 0 hours 0 minutes and 4.35 seconds (4.35s).
01:49:46  
01:49:46  Completed with 1 error and 0 warnings:
01:49:46  
01:49:46    HTTPSConnectionPool(host='hoge.cloud.databricks.com', port=443): Max retries exceeded with url: /api/2.0/pipelines/8cd5b015-2315-48f8-991c-537215d5c989 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1006)')))
01:49:46  
01:49:46  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

Steps To Reproduce

In as much detail as possible, please provide steps to reproduce the issue. Sample data that triggers the issue, example model code, etc is all very helpful here.

Run streaming table model more than 2 times

Step 1: dbt run Step 2: dbt run

Expected behavior

A clear and concise description of what you expected to happen.

Create or Reflesh function will be called

Screenshots and log output

If applicable, add screenshots or log output to help explain your problem.

ERROR

% dbt run
01:49:41  Running with dbt=1.8.1
01:49:41  Registered adapter: databricks=1.8.1
01:49:42  Found 1 model, 1 source, 586 macros
01:49:42  
01:49:43  Concurrency: 1 threads (target='dev')
01:49:43  
01:49:43  1 of 1 START sql streaming_table model schema_demo.daily_model_load_st_2 ....... [RUN]
01:49:45  Unhandled error while executing 
HTTPSConnectionPool(host='hoge.cloud.databricks.com', port=443): Max retries exceeded with url: /api/2.0/pipelines/8cd5b015-2315-48f8-991c-537215d5c989 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1006)')))
01:49:45  1 of 1 ERROR creating sql streaming_table model schema_demo.daily_model_load_st_2  [ERROR in 1.61s]
01:49:46  
01:49:46  Finished running 1 streaming table model in 0 hours 0 minutes and 4.35 seconds (4.35s).
01:49:46  
01:49:46  Completed with 1 error and 0 warnings:
01:49:46  
01:49:46    HTTPSConnectionPool(host='hoge.cloud.databricks.com', port=443): Max retries exceeded with url: /api/2.0/pipelines/8cd5b015-2315-48f8-991c-537215d5c989 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1006)')))
01:49:46  
01:49:46  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

System information

The output of dbt --version:

1.8.1

The operating system you're using:

The output of python --version:

Additional context

Add any other context about the problem here.

I have already working on fix this issue and probably solve it. So I will send the PR for that.

benc-db commented 5 months ago

See my comment on your PR. Please file bug with your company's Databricks contact about the SSLError when calling pipeline API.

case-k-git commented 4 months ago

Yea Thank you. we can close this issue. Thank you for your review.

case-k-git commented 4 months ago

This is solve the issue in my case thank you!

pip install pip-system-certs

https://github.com/dbt-labs/dbt-core/issues/8554