datafold / data-diff

Compare tables within or across databases
https://docs.datafold.com
MIT License
2.95k stars 272 forks source link

Remove UUID-type inference for textual columns #852

Closed vvkh closed 8 months ago

vvkh commented 10 months ago

We have to be 100% sure that all the values in a textual column are valid UUIDs if we want to treat it as native UUID. Otherwise, the diff may crash when we are converting an invalid UUID to a number. Therefore, inferring the UUID type from a sample is not reliable enough. The PR removes the inference.

Only 1 potential case is affected, when a native UUID column is being compared with a textual column with UUID strings. This won't work out of the box anymore, but it will still be possible with explicit cast in a view. Other cases are covered by alphanumeric PK support.

Before the change Column A \ Column B Native UUID Text (only UUID) Text (only alphanum) Text (arbitrary)
Native UUID ✅ (as UUID)
Text (only UUID) ✅ (as UUID) ✅ (as UUID) ✅ (as alphanum)
Text (only alphanum) ✅ (as alphanum)
Text (arbitrary)
After the change Column A \ Column B Native UUID Text (only UUID) Text (only alphanum) Text (arbitrary)
Native UUID ❌ (but possible with explicit cast)
Text (only UUID) ❌ (but possible with explicit cast) ✅ (as alphanum) ✅ (as alphanum)
Text (only alphanum) ✅ (as alphanum)
Text (arbitrary)
github-actions[bot] commented 8 months ago

This pull request has been marked as stale because it has been open for 60 days with no activity. If you would like the pull request to remain open, please comment on the pull request and it will be added to the triage queue. Otherwise, it will be closed in 7 days.

github-actions[bot] commented 8 months ago

Although we are closing this pull request as stale, it's not gone forever. PRs can be reopened if there is renewed community interest. Just add a comment and it will be reopened for triage.