The shape watcher lens currently runs a Count Distinct query during the training phase to discover categorical attributes. This is not great for large datasets. Fortunately, we don't care about the actual number of distinct values... just that they're below some threshold. HyperLogLog count would be a much more efficient way to achieve the same goal.
The shape watcher lens currently runs a Count Distinct query during the training phase to discover categorical attributes. This is not great for large datasets. Fortunately, we don't care about the actual number of distinct values... just that they're below some threshold. HyperLogLog count would be a much more efficient way to achieve the same goal.