Closed azdoherty closed 7 months ago
Should I close this due to the discussion here? https://github.com/dbt-labs/dbt-adapters/discussions/92
Hey @azdoherty definitely move that discussion over there. Fwiw - this is probably a dbt-core library issue - it's not possible to run SQL statements in parallel today - dbt-external-table package or otherwise. I've provided the same workarounds as you have done - via hooks since models can run in parallel and some other funky patterns using custom materializations: https://gist.github.com/jeremyyeo/b61655a3e5a52eb27640363650c79a1e - idea is the same though - models run in parallel (up to threads
config) so use that mechanism to do parallel run operations instead.
However - this is primarily a dbt-core / dbt-adapters library issue imho.
Additionally this is likely a dupe of #109
Describe the feature
Stage external resources would run a lot faster if it used multiple threads for multiple tables
Describe alternatives you've considered
I had previously used a pre-hook before each model that referenced an external table, which as they were part of the models did run in parallel. This implementation was a bit messy though as the external table did not appear in the DAG and you had to include a
CREATE OR REPLACE EXTERNAL TABLE ...
in your modelAdditional context
I have only used this in bigquery
Who will this benefit?
Anyone with a lot of external tables they need to stage before each build - I have 10 and it takes over a minute, and it will scale linearly with the number of external tables