Open nickozilla opened 8 months ago
We have started seeing this too ever since we got Python models working again with the latest regression fixes (batch ID and nested data structures). However, our problem is not just limited to incremental models. Like @nickozilla mentioned, dbt reaches the Python model in the execution stream and then just hangs without ever submitting a Dataproc job. We have tried to debug this further by turning on the debug flag, but this does not display anything useful. It will just reach the Python models, output the Python code in log output, and then the process hangs indefinitely until killed.
We have tried setting the various timeout configuration options to at least timeout these specific jobs and noticed this has no impact either. The dbt process will always hang here which suggests that there is no mechanism in place to gracefully handle Dataproc failures, and/or the Dataproc code itself is blocking the main dbt process from continuing when submitting Dataproc jobs.
Additional info:
For us, the job submits but if there is an error in the job (e.g. in my case there was a JSON column and apparently that's not supported for writing to BQ), then dbt hangs and does not continue afterwards.
Not great for production, where we want it to behave like a normal failure and continue so that the post-run alerting will run.
Hi folks! I was curious when you started to noticed this? I'm curious if there was actually a change on the Datapoc side rather than dbt.
Is this a new bug in dbt-bigquery?
Current Behavior
We've noticed lately that when a python incremental model is run in our CICD pipeline, sometimes it will hang indefinitely & never submit a job to dataproc. We haven't been able to identify why this happens in some invocations, but not others & this seems to be unique to incremental python models.
Expected Behavior
BigQuery adapter: Submitting batch job with id: ...
created python incremental model ...
Steps To Reproduce
When running
dbt run --target=unit-test --exclude tag:unit-test-new
with a python incremental model caught in the result, the job will never be submitted & instead hang in invocation indefinitely.Relevant log output
No response
Environment
Additional Context
The failing model also has these config properties