The following code currently has getDataSize as an estimated value. The Iceberg rolling file write operation relies on this method, which may result in writing files that are much smaller than expected.
/**
* @return the total size of data written to the file and buffered in memory
*/
public long getDataSize() {
return lastRowGroupEndPos + columnStore.getBufferedSize();
}
Could we provide a potentially larger getDataSize? I can't think of any downsides at the moment.
Do you have any concrete suggestion on what value to provide? My concern is that changing the behavior may affect a lot of downstream applications in the wild without notice.
The following code currently has getDataSize as an estimated value. The Iceberg rolling file write operation relies on this method, which may result in writing files that are much smaller than expected.
Could we provide a potentially larger getDataSize? I can't think of any downsides at the moment.
Component(s)
No response