Open saitharun15 opened 4 days ago
Hi, @huaxingao @karuppayya @aokolnychyi @RussellSpitzer Can you help review this PR
@RussellSpitzer, thanks for the review comments,I will address them soon. As per @huaxingao implementation here , aggregate pushdown is skipped when row level deletes are detected, I have applied a similar change here as well.
This PR helps to derives min,max,numOfNulls Statistics on the fly from manifest files to report back them to Spark.
Currently only Ndv is calculated and reported back to Spark Engine, which leads to inaccurate plans in Spark side since min,max,nullCount are returned as NULL
As there is a discussion still going on whether to store stats partition level or table level, even if we calculate them in either ways there would be an issue as per this comment in discussion #10791
These changes helps to enable the onFly collection of the stats using a table property or a session conf(by default it's false)
cc @guykhazma @jeesou