The purpose of the lakehouse_utils package is threefold
I) Expedite the time and level of effort for migrating pipelines from cloud data warehouses to the Lakehouse (ie dbt + databricks). This is done by transpiling functions that are not natively available in spark sql to compatible spark sql functions that take in the same input(s) and render the same outputs. This is all done via DBT macros (feel free to reference the macros directory).
II) Be a centralized source of truth for warehouse function mapping to Databricks function mapping. Also surface instances where certain functions can not be automated and manual intervention is required. You can find the full list of supported functions in the functionlist.csv in the seed directory; you can also find further information in the read.me in the macros directory.
III) Surface best practices around unit tests to instill confidence that the macros are robust and reliable (feel free to reference the tests directory).
This checklist is a cut down version of the best practices that we have identified as the package hub has grown. Although meeting these checklist items is not a prerequisite to being added to the Hub, we have found that packages which don't conform provide a worse user experience.
First run experience
[x] The package includes a README which explains how to get started with the package and customise its behaviour
[x] The README indicates which data warehouses/platforms are expected to work with this package
Customisability
[x] The package uses ref or source, instead of hard-coding table references.
Packages for data transformation (delete if not relevant):
[x] provide a mechanism (such as variables) to customise the location of source tables.
[x] do not assume database/schema names in sources.
Dependencies
Dependencies on dbt Core
[x] The package has set a supported require-dbt-version range in dbt_project.yml. Example: A package which depends on functionality added in dbt Core 1.2 should set its require-dbt-version property to [">=1.2.0", "<2.0.0"].
Dependencies on other packages defined in packages.yml:
[x] Dependencies are imported from the dbt Package Hub when available, as opposed to a git installation.
[x] Dependencies contain the widest possible range of supported versions, to minimise issues in dependency resolution.
[x] In particular, dependencies are not pinned to a patch version unless there is a known incompatibility.
Interoperability
[x] The package does not override dbt Core behaviour in such a way as to impact other dbt resources (models, tests, etc) not provided by the package.
[x] The package uses the cross-database macros built into dbt Core where available, such as {{ dbt.except() }} and {{ dbt.type_string() }}.
[x] The package disambiguates its resource names to avoid clashes with nodes that are likely to already exist in a project. For example, packages should not provide a model simply called users.
Versioning
[x] (Required): The package's git tags validates against the regex defined in version.py
[x] The package's version follows the guidance of Semantic Versioning 2.0.0. (Note in particular the recommendation for production-ready packages to be version 1.0.0 or above)
Description
The purpose of the lakehouse_utils package is threefold
I) Expedite the time and level of effort for migrating pipelines from cloud data warehouses to the Lakehouse (ie dbt + databricks). This is done by transpiling functions that are not natively available in spark sql to compatible spark sql functions that take in the same input(s) and render the same outputs. This is all done via DBT macros (feel free to reference the macros directory).
II) Be a centralized source of truth for warehouse function mapping to Databricks function mapping. Also surface instances where certain functions can not be automated and manual intervention is required. You can find the full list of supported functions in the functionlist.csv in the seed directory; you can also find further information in the read.me in the macros directory.
III) Surface best practices around unit tests to instill confidence that the macros are robust and reliable (feel free to reference the tests directory).
Link to your package's repository: https://github.com/rlsalcido24/lakehouse_utils
Checklist
This checklist is a cut down version of the best practices that we have identified as the package hub has grown. Although meeting these checklist items is not a prerequisite to being added to the Hub, we have found that packages which don't conform provide a worse user experience.
First run experience
Customisability
Packages for data transformation (delete if not relevant):
Dependencies
Dependencies on dbt Core
require-dbt-version
range indbt_project.yml
. Example: A package which depends on functionality added in dbt Core 1.2 should set itsrequire-dbt-version
property to[">=1.2.0", "<2.0.0"]
.Dependencies on other packages defined in packages.yml:
Interoperability
{{ dbt.except() }}
and{{ dbt.type_string() }}
.users
.Versioning