Following with #416, we should add the CDF Quantiles computation to the Data Analyzer instead of computing it on the external API.
Right now, we are using the QbeastUtils interface to calculate the String and the Numeric bins for a specific column, and then we need to use those bins to configure the transformation.
val idStats = QbeastUtils.computeQuantilesForColumn("id", df)
df.write.format("qbeast").option("columnsToIndex", "id").option("columnStats","""{id_quantiles:$idStats}""").save(...)
Otherwise, the write would fail.
We should change to avoid using the QbeastUtils methods and just execute.
As a first step, the Data Analyzer should not constantly compute the new stats for Quantiles. If we want to trigger a new Revision, we would still need to do it manually.
We need to design and understand the effect of changes in the data distribution and which is the criteria to know the if the stats had diverged enough from the original conf.
Following with #416, we should add the CDF Quantiles computation to the Data Analyzer instead of computing it on the external API.
Right now, we are using the
QbeastUtils
interface to calculate the String and the Numeric bins for a specific column, and then we need to use those bins to configure the transformation.Otherwise, the write would fail.
We should change to avoid using the
QbeastUtils
methods and just execute.As a first step, the Data Analyzer should not constantly compute the new stats for Quantiles. If we want to trigger a new Revision, we would still need to do it manually.
We need to design and understand the effect of changes in the data distribution and which is the criteria to know the if the stats had diverged enough from the original conf.