Datavault-UK / automate-dv

A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)
https://www.automate-dv.com
Apache License 2.0
478 stars 114 forks source link

Rename hashing functions #199

Open sicarul opened 1 year ago

sicarul commented 1 year ago

This PR renames the hash function to dv_hash; this is to avoid conflicts with dbt.hash.

Even though the function is namespaced, when a project wants to replace the hashing function (e.g. i wanted it to generate a sha-224 string instead of sha-256 binary field), it replaces default__hash, which affects dbt.hash and whichever functions depend on it, like the generate_surrogate_key function in dbt_utils.

I'm open to other approaches if you can suggest any, thanks!

DVAlexHiggs commented 1 year ago

Hi! Whilst we appreciate the intent behind this, this is going to have too much knock-on effect in our codebase and to our user-base.

I would recommend using the built-in dbt functionality for solving this issue.

Essentially:

# dbt_project.yml
dispatch:
  - macro_namespace: 'my_project'
    search_order: ['my_project', 'automate_dv', 'dbt']

This should ensure your project version of hash overrides dbt's and automate_dv's version correctly.

sicarul commented 1 year ago

I'm using this config (Also tried as you sent), and it doesn't work:

dispatch:
  - macro_namespace: automate_dv
    search_order: ['pulumi_dwh', 'automate_dv', 'dbt']
  - macro_namespace: elementary
    search_order: ['elementary', 'dbt', 'pulumi_dwh']

Trying to run a test with elementary throws:

17:58:34  Compilation Error in test elementary_volume_anomalies_dim_organization_both__3__2__56__day_of_week__day__1__CREATED_AT (models/information_marts/core/dimensions/schema/dim_organization.yml)
17:58:34    parameter 'alias' was not provided
17:58:34
17:58:34    > in macro hash (macros/utils/hash.sql)
17:58:34    > called by macro generate_surrogate_key (macros/utils/cross_db_utils/generate_surrogate_key.sql)
17:58:34    > called by macro table_monitoring_query (macros/edr/data_monitoring/monitors_query/table_monitoring_query.sql)
17:58:34    > called by macro test_table_anomalies (macros/edr/tests/test_table_anomalies.sql)
17:58:34    > called by macro test_volume_anomalies (macros/edr/tests/test_volume_anomalies.sql)
17:58:34    > called by test elementary_volume_anomalies_dim_organization_both__3__2__56__day_of_week__day__1__CREATED_AT (models/information_marts/core/dimensions/schema/dim_organization.yml)

Which happens because it tries to use my overriden hash. If this is an unacceptable change, i'll have to work off my fork then

Thanks anyway!

DVAlexHiggs commented 1 year ago

I'm using this config (Also tried as you sent), and it doesn't work:

dispatch:
  - macro_namespace: automate_dv
    search_order: ['pulumi_dwh', 'automate_dv', 'dbt']
  - macro_namespace: elementary
    search_order: ['elementary', 'dbt', 'pulumi_dwh']

Trying to run a test with elementary throws:

17:58:34  Compilation Error in test elementary_volume_anomalies_dim_organization_both__3__2__56__day_of_week__day__1__CREATED_AT (models/information_marts/core/dimensions/schema/dim_organization.yml)
17:58:34    parameter 'alias' was not provided
17:58:34
17:58:34    > in macro hash (macros/utils/hash.sql)
17:58:34    > called by macro generate_surrogate_key (macros/utils/cross_db_utils/generate_surrogate_key.sql)
17:58:34    > called by macro table_monitoring_query (macros/edr/data_monitoring/monitors_query/table_monitoring_query.sql)
17:58:34    > called by macro test_table_anomalies (macros/edr/tests/test_table_anomalies.sql)
17:58:34    > called by macro test_volume_anomalies (macros/edr/tests/test_volume_anomalies.sql)
17:58:34    > called by test elementary_volume_anomalies_dim_organization_both__3__2__56__day_of_week__day__1__CREATED_AT (models/information_marts/core/dimensions/schema/dim_organization.yml)

Which happens because it tries to use my overriden hash. If this is an unacceptable change, i'll have to work off my fork then

Thanks anyway!

That is frustrating. I would recommend instead reporting this as a bug on dbt-core instead.

Failing that, try to remove the dbt entry in the search_order to see if that helps things. I suspect because the macro is the same name it is assuming the footprint (arguments etc.) of the macro are also the same. Removing the dbt entry may prevent it from looking for arguments you aren't providing (or are).

sicarul commented 1 year ago

I tried also without dbt but it has the same behavior. I'm going on PTO this weekend so i'll look at creating a bug report when i come back. Thanks