deephaven / deephaven-core

Deephaven Community Core
Other
257 stars 80 forks source link

ColumnStatistics needs Count of non-null #4459

Closed niloc132 closed 1 year ago

niloc132 commented 1 year ago

The ColumnStatistics feature in the UI has both Count and Size when viewing details for a specific column. The Size represents the total size of the table, while the Count represents the number of non-null entries in the table.

This is in contrast with the current implementation of our aggregations, where Count is just "take the table's current size and make a whole column for it". Arguably this would be a helpful aggregation to have in general, rather than requiring a custom column like "FooCount = isNull(Foo) ? 1 : 0", and then summing that column.

Prereq for https://github.com/deephaven/deephaven-core/issues/697.

mofojed commented 1 year ago

I think we need the Count to be non-null columns, so the existing Count aggregation won't work. @cpwright can you confirm that we need the count of non-null items in a column for the column statistics? Or is there anyone else we can check with before that is dropped from the DHC column stastistics?

rcaudy commented 1 year ago

@rbasralian might also have an opinion. I don't think "non-null count" should be considered a sacred feature; it's so unimportant we haven't even built an aggregation for it.

cpwright commented 1 year ago

The count is non-null, it could be dropped; but likely provides very little benefit to drop it if you are actually replicating column statistics. One of the useful features is giving you the number of distinct values; with a cap of 20 (by default, property changeable) in which case you need to iterate the column anyway. The cap of 20 prevents you from needing inordinate memory for something that is useless and no existing aggregation will provide the unique values/counts w/ the cap.

rcaudy commented 1 year ago

We have a new ticket #4622 specifically asking for count non-null and count null. Closing this one.