Open snordhausen opened 3 weeks ago
Thanks for reporting this and suggesting a solution.
I think a better solution would be to do better sampling, for example by adding WHERE pk_col IS NOT NULL
. What do you think?
That sounds promising, I just don't know exactly how the sampling mechanism works. If the sampling happens column-by-column, it should work.
But if we apply the IS NOT NULL
to all columns at the same time, wouldn't we risk not getting any samples at all? E.g when one of the columns is completely NULL because it is used for a new software feature that is not yet live.
That's a good point. If a column is entirely null, there is no way to verify if it's UUID or not.
At the same time, if we use a column that is arbitrary text as a key, the diff might return incorrect results.
But it shouldn't be too hard to mark a column that is all null (or empty) as such. Then when comparing the column types, it could safely adopt whatever is the type in the other table.
After switching from data-diff to reladiff, one of our tables started failing with
As far as I can tell, the root cause is that this code in sqeleton/databases/base.py uses sampling to determine if a column is String_UUID. But when such a column is nullable, it depends on chance if the sample contains NULL values. If it does, sqeleton will use the Text type, otherwise it will use String_UUID.
This change ensures that such columns can be compared reliably. I'm not sure if the is the best way to do it, but in our internal testing it looks good.