This dbt package contains macros that:
Check dbt Hub for the latest installation instructions, or read the docs for more information on installing packages.
This package provides "shims" for:
dbt_utils.get_relations_by_pattern
dbt_utils.groupby
dbt_utils.recency
dbt_utils.any_value
dbt_utils.listagg
dbt_utils.pivot
with apostrophe(s) in the values
In order to use these "shims," you should set a dispatch
config in your root project (on dbt v0.20.0 and newer). For example, with this project setting, dbt will first search for macro implementations inside the spark_utils
package when resolving macros from the dbt_utils
namespace:
dispatch:
- macro_namespace: dbt_utils
search_order: ['spark_utils', 'dbt_utils']
The spark-utils package may be able to provide compatibility for your package, especially if your package leverages dbt-utils macros for cross-database compatibility. This package does not need to be specified as a dependency of your package in packages.yml
. Instead, you should encourage anyone using your package on Apache Spark / Databricks to:
spark_utils
alongside your packagedispatch
config in their root project, like the one aboveCaveat: These are not tested in CI, or guaranteed to work on all platforms.
Each of these macros accepts a regex pattern, finds tables with names matching the pattern, and will loop over those tables to perform a maintenance operation:
spark_optimize_delta_tables
: Runs optimize
for all matched Delta tablesspark_vacuum_delta_tables
: Runs vacuum
for all matched Delta tablesspark_analyze_tables
: Compute statistics for all matched tablesWe welcome contributions to this repo! To contribute a new feature or a fix,
please open a Pull Request with 1) your changes and 2) updated documentation for
the README.md
file.
The macros are tested with pytest
and
pytest-dbt-core
. For example,
the create_tables
macro is tested by:
spark_session.sql(f"CREATE TABLE {table_name} (id int) USING parquet")
tables = macro_generator()
assert simple_table in tables
spark_session.sql(f"DROP TABLE IF EXISTS {table_name}")
A macro is fetched using the
macro_generator
fixture and providing the macro name trough
indirect parameterization:
@pytest.mark.parametrize(
"macro_generator", ["macro.spark_utils.get_tables"], indirect=True
)
def test_create_table(macro_generator: MacroGenerator) -> None:
Everyone interacting in the dbt project's codebases, issue trackers, chat rooms, and mailing lists is expected to follow the PyPA Code of Conduct.