diffix / explorer

Tool to automatically explore and generate stats on data anonymized using Diffix
MIT License
2 stars 1 forks source link

Text columns: some have common values others not #355

Closed sebastian closed 3 years ago

sebastian commented 3 years ago

Take the Addresses table of the Clinic dataset. The province columns lists distinct values, the city does not. Both clearly were able to produce some values.

The only difference is see is that one is considered categorical, the other not. Is that what causes this?

Bot should list some distinct values.

dandanlen commented 3 years ago

I think we have been thinking the same thing - I decided it made sense to list common values even if the column was judges as non-categorical. Is this effectively what you are suggesting?

If so, this will be fixed with the latest PR #357

sebastian commented 3 years ago

Is this effectively what you are suggesting?

Indeed that's what I was hoping for! Great to hear it is already implemented!