Closed rafaelkrysciak closed 4 months ago
I had the same problem, would be nice if they can fix that! @rafaelkrysciak In the mean time you can redefine the macro that generates the hash:
{% macro teradata__snapshot_hash_arguments(args) -%}
usrlib.hash_md5({%- for arg in args -%}
coalesce(cast({{ arg }} as varchar(200)), '')
{% if not loop.last %} || '|' || {% endif %}
{%- endfor -%})
{%- endmacro %}
You may have to install hash_md5
on your system first...
Thanks @datenbaecker. It works fine 👍
@rafaelkrysciak , the fix for this issue is available in dbt-teradata 1.8.0 release.
Describe the bug
The snapshot calculation relies on the Teradata HASHROW function. The
dbt_scd_id
is generated for each row based on the providedunique_key
and the current timestamp. However, the HASHROW function produces a 4-byte hash, which is highly prone to collisions. For instance, the valuesd3dadd49420542fb49ffbf6a77349b45
and34f325fe5a4216f27357328b61c9eccb
both produce the same hash02-27-E3-B4
. Similarly, the numbers162181727
and880145039
generate the same hash2E-5B-FE-DD
. In a source with 36 million numbers, we have over 180 thousand duplicate dbt_scd_id.These collisions cause the snapshot update to fail with the error: [Error 7547] Target row updated by multiple source rows.
Steps To Reproduce
Create a source with the provided values as IDs and then try to create a snapshot of them.
Expected behavior
Calculating the snapshot without errors.
Screenshots and log output
The output of
dbt --version
:The operating system you're using: Windows 11
The output of
python --version
: Python 3.11.3Additional context