The macros in this package presume that you're using an adapter named spark, and so they're prefixed with spark__ accordingly. But what about users of the dbt-databricks adapter?
There are a few options here, which I'll discuss in more detail below. (cc @superdupershant)
Adapter inheritance
dbt will consider "parent adapter" macros as valid dispatch candidates, after searching for the current adapter name (docs).
If possible, this is by far the most expedient solution enabling the most code reuse / requiring the least code duplication. Everything in this package (and other spark-compatible packages, including many of Fivetran's) can stay exactly the same, and will just work with dbt-databricks, too.
Note: Because dbt-spark keeps its tricky dependencies in extras, and otherwise requires only dbt-core and sqlparams, I believe that dbt-databricks could require dbt-spark (with no extras), and it would still not require installing PyHive/pyodbc/etc.
This is definitely the approach I'd recommend. It also lays the foundation for a more symbiotic relationship between dbt-spark and dbt-databricks going forward. The latter can import functionality from the former, the former can upstream cleaner approaches from the latter, and we can collectively minimize the footprint of duplicative code.
Wrapper macros
If adapter inheritance wouldn't work, for every spark__some_macro(), we need a macro named databricks__some_macro(), too. It could be a fully copy-paste of the same code, or (better) it could be a "wrapper" macro that redirects to use the spark__ version:
This works a lot like macro dispatch, it just needs to be manually specified explicitly each time.
Where could these "wrapper" macros live?
Inside the spark_utils package, below each spark__ macro, we add a databricks__ macro that calls it with the exact same arguments. This is how tsql-utils supported dbt-synapse before "parent" dispatch was introduced in dbt-core v0.20 (https://github.com/dbt-msft/tsql-utils/pull/64).
Inside a separate package, databricks_utils, that shims spark_utils (and thereby, in turn, shims dbt_utils).
Inside the dbt-databricks adapter itself. Adapter plugins can define any macros they want! If dbt-databricks were to include databricks__datediff() within its own codebase, then I think users would just need to add this config:
# dbt_project.yml
dispatch:
- macro_namespace: dbt_utils
search_order: ['dbt', 'dbt_utils']
# i.e. 'dbt' instead of 'spark_utils'
# 'dbt' includes macros in the 'dbt-databricks' plugin, too
Why not pursue the third option? Packages like spark_utils exist because the pace of package development differs from that of dbt core/development development. (The macros in this package may be effected by changes in dbt_utils, but they're less likely to effected by changes in dbt-core.) I also believe the maintenance expectations differ—the macros in this package are tested and useful, but not as thoroughly tested as the functionality in dbt-spark, and not as essential to basic dbt functionality as the macros in the adapter plugin (materializations, create table, etc).
See: Slack thread
The macros in this package presume that you're using an adapter named
spark
, and so they're prefixed withspark__
accordingly. But what about users of thedbt-databricks
adapter?There are a few options here, which I'll discuss in more detail below. (cc @superdupershant)
Adapter inheritance
dbt will consider "parent adapter" macros as valid dispatch candidates, after searching for the current adapter name (docs).
If possible, this is by far the most expedient solution enabling the most code reuse / requiring the least code duplication. Everything in this package (and other
spark
-compatible packages, including many of Fivetran's) can stay exactly the same, and will just work withdbt-databricks
, too.How would this look in practice? The
dbt-databricks
adapter could add an upstream dependency on thedbt-spark
adapter, similar to how thedbt-redshift
plugin declares a dependency on thedbt-postgres
adapter, the thedbt-synapse
plugin declares a dependency on thedbt-sqlserver
adapter, etc.Note: Because
dbt-spark
keeps its tricky dependencies inextras
, and otherwise requires onlydbt-core
andsqlparams
, I believe thatdbt-databricks
could requiredbt-spark
(with no extras), and it would still not require installingPyHive
/pyodbc
/etc.This is definitely the approach I'd recommend. It also lays the foundation for a more symbiotic relationship between
dbt-spark
anddbt-databricks
going forward. The latter can import functionality from the former, the former can upstream cleaner approaches from the latter, and we can collectively minimize the footprint of duplicative code.Wrapper macros
If adapter inheritance wouldn't work, for every
spark__some_macro()
, we need a macro nameddatabricks__some_macro()
, too. It could be a fully copy-paste of the same code, or (better) it could be a "wrapper" macro that redirects to use thespark__
version:This works a lot like macro dispatch, it just needs to be manually specified explicitly each time.
Where could these "wrapper" macros live?
spark_utils
package, below eachspark__
macro, we add adatabricks__
macro that calls it with the exact same arguments. This is howtsql-utils
supporteddbt-synapse
before "parent" dispatch was introduced in dbt-core v0.20 (https://github.com/dbt-msft/tsql-utils/pull/64).databricks_utils
, that shimsspark_utils
(and thereby, in turn, shimsdbt_utils
).dbt-databricks
adapter itself. Adapter plugins can define any macros they want! Ifdbt-databricks
were to includedatabricks__datediff()
within its own codebase, then I think users would just need to add this config:Why not pursue the third option? Packages like
spark_utils
exist because the pace of package development differs from that of dbt core/development development. (The macros in this package may be effected by changes indbt_utils
, but they're less likely to effected by changes indbt-core
.) I also believe the maintenance expectations differ—the macros in this package are tested and useful, but not as thoroughly tested as the functionality indbt-spark
, and not as essential to basic dbt functionality as the macros in the adapter plugin (materializations,create table
, etc).