Datavault-UK / automate-dv

A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)
https://www.automate-dv.com
Apache License 2.0
478 stars 114 forks source link

[BUG] DataBricks SHA hash not being cast as binary #219

Closed UselessAlias closed 3 weeks ago

UselessAlias commented 6 months ago

Describe the bug Noticed that the databricks SHA hashing algorithm is not casting the column to a binary type leaving it as a string. Environment

dbt version: 1.6.0 automate_dv version: 0.10.1 Database/Platform: DataBricks

To Reproduce Steps to reproduce the behavior: Run the staging macro on DataBricks using the sha algorithm. The resulting column will be a string rather than binary.

Expected behavior The hash keys should be binarys as expected by the DV methodology.

Additional context Appears in the code that there is an existing cast binary implementation for DataBricks which is being used on the MD5 hashing algorithm. This just need to be applied to the SHA version as well.

Screenshots

Screenshot 2023-12-14 at 09 38 36 Screenshot 2023-12-14 at 09 38 40

AB#5354

DVAlexHiggs commented 6 months ago

Hi! Thanks for this. We are aware of hashing issues and are planning a fix across the board. Our original design decisions on this weren't the best!

DVAlexHiggs commented 4 months ago

Hi @UselessAlias Just to let you know, our next release will introduce opt-in (so we don't break things for everyone!) support for what we're calling "Native hashing" for this. For those interested, this will also provide the same resolution for BigQuery which is also handicapped by strings at the moment. Thanks for your patience

DVAlexHiggs commented 3 weeks ago

Fixed and released in v0.11.0, Thank you for your patience on this one!

If the issue persists, please re-open this issue.