databricks / dbt-databricks

A dbt adapter for Databricks.
https://databricks.com
Apache License 2.0
216 stars 118 forks source link

Support for partition overwrite with Delta #75

Closed dejan closed 1 year ago

dejan commented 2 years ago

Describe the feature

This was reported in https://github.com/dbt-labs/dbt-spark/issues/155 but I think you might be more interested in resolving the issue

Currently, the insert_overwrite strategy throws an error if file format is set to delta because it doesn't support dynamic partition overwrite

Delta already supports partitions overwrite but it seems that dbt adapter implementation is not making use of it.

Describe alternatives you've considered

I could not find a way to atomically overwrite a partition.

Who will this benefit?

Everyone using dbt and Delta.

bilalaslamseattle commented 2 years ago

@dejan a quick update on this. The Delta folks at Databricks are looking at supporting dynamic partition overwrite. It's prioritized in their roadmap, I'll post back here once it's released.

dejan commented 2 years ago

Thanks @bilalaslamseattle !

creativedutchmen commented 2 years ago

Any updates on this?

bilalaslamseattle commented 2 years ago

@creativedutchmen this capability it is in Delta Lake 2.0. We now have to implement it in dbt-databricks. It's on our radar. @superdupershant @ueshin and @allisonwang-db FYI.

lwbayes commented 2 years ago

+1 on this. Adding one data point that this will be a blocker for us for adapting dbt

creativedutchmen commented 2 years ago

Great, thanks! This will greatly reduce the runtime of some of our heaviest models :)

flvndh commented 2 years ago

I submitted a PR to include it in dbt-spark if you want to look at it: https://github.com/dbt-labs/dbt-spark/pull/430

github-actions[bot] commented 1 year ago

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue.

creativedutchmen commented 1 year ago

Still relevant for me

dejan commented 1 year ago

@bilalaslamseattle please share some updates on this.

andrefurlan-db commented 1 year ago

This issue has been address by #310

dejan commented 8 months ago

I don't think it was a good decision to close this because #310 has a major flaw which was not resolved (only documented) https://github.com/databricks/dbt-databricks/issues/334

dejan commented 8 months ago

I haven't confirmed this but by looking at the documentation insert_overwrite apparently now supports dynamic partition overwrite (it no longer errors for Delta), however it is stated that it only works for All-purpose clusters which is also a major drawback as that's not cost-efficient.

Can someone please confirm the status quo and plans on properly providing such a basic and common case such as partition overwrite?

bilalaslamseattle commented 8 months ago

however it is stated that it only works for All-purpose clusters which is also a major drawback as that's not cost-efficient. @andrefurlan-db can you confirm if this is correct? Seems odd.

benwhelankf commented 7 months ago

@bilalaslamseattle / @andrefurlan-db was there an update on this one or a thread somewhere else? When I run partition overwrite against SQL warehouses, I get:

Error running query: [_LEGACY_ERROR_TEMP_DBR_0222] org.apache.spark.sql.catalyst.ExtendedAnalysisException: Configuration spark.sql.sources.partitionOverwriteMode is not available.

Looks like its attempting to set some custom spark config which I don't think is allowed on SQL warehouses.

SemyonSinchenko commented 4 months ago

Are there any plans to implement it? Or any updates?