Open rroelke opened 2 hours ago
If we don't do this, we run the risk of deleting this API by accident in a future release
maybe someone who is relying on this API could write some tests / additonal documentation. @avantgardnerio do you know of anyone who might have some time?
The extent of the documentation for
TableProvider::statistics
in version 43.0.0 is:This offers no explanation as to how the statistics will or will not be used.
A user with experience in analytical database engines writing a custom
TableProvider
implementation may suspect thatTableProvider::statistics
is used by the datafusion query optimizer to determine join orders, perhaps among other things.However, this conclusion is apparently incorrect, which I deduce from the following pieces of evidence: 1) I am a user fitting that description and found that my custom
TableProvider::statistics
was not called in the presence of a join query 2)cargo check --workspace --tests
runs with no errors if I remove thefn statistics
declaration from thetrait TableProvider
definition 3) having found the source code for the rule which changes join orders it is clear that it callsExecutionPlan::statistics
instead.Expectation
The documentation should set appropriate expectations for what
TableProvider::statistics
is used for, so that developers can make informed choices about whether or not to implement it.Additional context
The apparent answer to what
TableProvider::statistics
is used for is "nothing" based on thecargo check --workspace --tests
comment above, but removing the trait method is a breaking change. Based on the slack discussion prior to filing this issue, at least one user is depending onTableProvider::statistics
for their custom optimizer rules and removing it would require them to find a workaround.Short of deprecating or removing the trait method, I would personally be satisfied just with updates to the method documentation.