Does Iceberg support incremental statistics calculation? How can this be done for columns? How do you calculate changes between two snapshots?
Hello everyone. I want to collect column statistics without reading the table every time. After examining the manifest files, I found that only statistics (value count, null count, NaN count, upper, lower) for changes made to a partition are stored.
As far as I understand, Puffin files allow storing NDV, but I couldn’t find information on how to use them. Can someone provide guidance or a link to documentation that contains the answers? Thanks all.
Query engine
Iceberg API
Question
Does Iceberg support incremental statistics calculation? How can this be done for columns? How do you calculate changes between two snapshots?
Hello everyone. I want to collect column statistics without reading the table every time. After examining the manifest files, I found that only statistics (value count, null count, NaN count, upper, lower) for changes made to a partition are stored.
As far as I understand, Puffin files allow storing NDV, but I couldn’t find information on how to use them. Can someone provide guidance or a link to documentation that contains the answers? Thanks all.