Closed niloc132 closed 1 year ago
I think we need the Count to be non-null columns, so the existing Count
aggregation won't work. @cpwright can you confirm that we need the count of non-null items in a column for the column statistics? Or is there anyone else we can check with before that is dropped from the DHC column stastistics?
@rbasralian might also have an opinion. I don't think "non-null count" should be considered a sacred feature; it's so unimportant we haven't even built an aggregation for it.
The count is non-null, it could be dropped; but likely provides very little benefit to drop it if you are actually replicating column statistics. One of the useful features is giving you the number of distinct values; with a cap of 20 (by default, property changeable) in which case you need to iterate the column anyway. The cap of 20 prevents you from needing inordinate memory for something that is useless and no existing aggregation will provide the unique values/counts w/ the cap.
We have a new ticket #4622 specifically asking for count non-null and count null. Closing this one.
The ColumnStatistics feature in the UI has both Count and Size when viewing details for a specific column. The Size represents the total size of the table, while the Count represents the number of non-null entries in the table.
This is in contrast with the current implementation of our aggregations, where
Count
is just "take the table's current size and make a whole column for it". Arguably this would be a helpful aggregation to have in general, rather than requiring a custom column like"FooCount = isNull(Foo) ? 1 : 0"
, and then summing that column.Prereq for https://github.com/deephaven/deephaven-core/issues/697.