dbt-labs / dbt-redshift

dbt-redshift contains all of the code enabling dbt to work with Amazon Redshift
https://getdbt.com
Apache License 2.0
101 stars 59 forks source link

[Feature] Optimize for Redshift Serverless #857

Open jaswanthikolla opened 4 months ago

jaswanthikolla commented 4 months ago

Is this your first time submitting a feature request?

Describe the feature

In Redshift Serverless, Queries are billed for a minimum of 60 seconds, It's better to batch the queries. For example, let's say you are running a model with full dependency. You run system table queries like pg_namespace and information_schema.tables at T0, and those results are processed, and the model query runs at T1-T2. You are billed from T0 to T2 instead of just T1-T2, which includes a lot of IO time. This same thing is done for every model in the dependency chain.

The proposal is to do this system tables queries at the startup time itself while resolving dependency resolution so that they are queried, and when the actual models are run it's not queried again ( Which makes the Redshift to wait).

Pulled from other Issues:

Describe alternatives you've considered

Multiple Workspaces with different RPU, but it's outside the scope of DBT.

Who will this benefit?

All redshift serverless users, This can save millions of dollars across industry.

Are you interested in contributing this feature?

I am 3 days into DBT, But Yes I can!

Anything else?

May be you can take this next level and use SQLLite to cache the system tables info locally.

amychen1776 commented 4 months ago

@jaswanthikolla Thank you so much for opening the three issues! And welcome to dbt :) In the future, feel free to group these similar requests together!