[CT-1692] [CT-1690] [Feature] Apache Spark support parameter `location_root` for python models

leo-schick commented 1 year ago

Is this your first time submitting a feature request?

[X] I have read the expectations for open source contributors
[X] I have searched the existing issues, and I could not find an existing issue for this feature
[X] I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Currently python models are always saved to the default location. Even when I set the location_root via

def model(dbt, session):
    dbt.config(
        location_root='/mnt/lakehouse/Finance'
    )
    # [...]
    return df

it does not specify the exact location for the python model but still uses the <schema>.<table_name> logic: log output:

[...]

else:
  msg = f"{type(df)} is not a supported type for dbt Python materialization"
  raise Exception(msg)

df.write.mode("overwrite").format("delta").option("overwriteSchema", "true").saveAsTable("dbt_dev.my_python_model")

14:41:10  Execution status: OK in 21.77 seconds

It would be great when dbt would pass the location_root parameter to the write command. For example like this:

[...]
df.write.mode("overwrite").format("delta").option("overwriteSchema", "true").saveAsTable(f"{location_root}/my_python_model")

Describe alternatives you've considered

No response

Who will this benefit?

Users of the sources:

Apache Spark
Databricks

Are you interested in contributing this feature?

Unfortunately, I am not so deep into dbt to develop this by myself.

Anything else?

No response

lostmygithubaccount commented 1 year ago

@leo-schick thanks for opening! this is something we should add (as an aside for dbt Labs maintainers, would we consider parity issues with SQL as bugs or enhancements?)

we may want to transfer this over to https://github.com/dbt-labs/dbt-spark, and then create a duplicate issue in https://github.com/databricks/databricks

edit: I'm going to go ahead and transfer to the Spark repo

leo-schick commented 1 year ago

I say it is an enhancement since it has never been defined as a feature that Apache Spark supports location_root for Python modes. Even though it looks straight forward that this should work, it obviously hasn’t been implemented in the first round so I guess it was not part of the original specs.

github-actions[bot] commented 1 year ago

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

dbt-labs / dbt-spark