data-mie / dbt-profiler

Macros for generating dbt model data profiles
Apache License 2.0
81 stars 33 forks source link

"Value overflow in a SUM aggregate" with HASH fields #85

Open StuartMiddleton opened 9 months ago

StuartMiddleton commented 9 months ago

As title indicates, the use of a hash generates a value overflow. Switching to MD5 runs fine.

stumelius commented 5 months ago

@StuartMiddleton Thanks for reporting this. And sorry for the late response. I'm embarrassed to come back to this after 3 months...

Do you have more information to give, like in which database did you encounter this?

StuartMiddleton commented 5 months ago

@stumelius It was some time ago, so my memory of the details are hazy. However, I recall that it was an internal database and the scenario was very simple and straight forwards. The tables used a surrogate key as the primary, which was a hash of one-or-more other fields. As soon as we tried to run dbt-profiler against a table with a hashed field, the above error was thrown.

I'm sure that it will be easily repeatable using the standard hash function against any test dataset.

stumelius commented 5 months ago

@StuartMiddleton Thanks for the additional info, I think I get the gist now. Like you said, it should be straightforward to reproduce this error.

Are you still working with the profiler? Would you be interested in contributing a fix? :)

StuartMiddleton commented 4 months ago

@stumelius I'm not against doing a fix, but it's very dependent upon my workload (which at present is high). So, I'm willing, but may not be able.

stumelius commented 4 weeks ago

I feel you. I'm in a similar boat...

Let's leave this here for now and when either of us has the bandwidth, we'll get back to it