HypoPG / hypopg

Hypothetical Indexes for PostgreSQL
https://hypopg.readthedocs.io
Other
1.39k stars 59 forks source link

estimate entry->pages and tuples question #22

Closed hailanwhu closed 6 years ago

hailanwhu commented 6 years ago

Recently, I want to write a plug about virtual index on pg. And I'm really happy to see this project. I have some questions about the funciton hypo_estimate_index. when you estimate(btree) the entry->tuples=rel->tuples,but there also are some tuples in upper pages you don't consider in. The same as pages.

rjuju commented 6 years ago

Hi, thanks for your interest in this project!

About entry->tuples, the planner wants the number of estimated leaf tuples, see https://github.com/postgres/postgres/blob/master/src/backend/optimizer/util/plancat.c#L388

About the number of pages, you're right some pages are not accounted. I'd need to estimate the tree height for that, but I've not done it yet. It's probably more or less hidden by the additional 20% bloat that's added when computing the size. Do you have some good ideas on how to estimate the tree height and the additional upper pages?

hailanwhu commented 6 years ago

Thanks for your interpretation. I make mistakes about the entry->tuples. About height, I neither have idea how to calculate . after some INSERT、UPDAT、 DELETE, the tree has been changed, I have no idea to model the process. And there may exist hot-tuple, increasing the challenge of the problem. I feel it`s really really difficult.

rjuju commented 6 years ago

Yes, that's quite difficult. Maybe some simple heuristics can be added to incrementally improve the size estimation (I think in the common usecases, the pages tend to be half filled for instance). Also, fortunately, having a rough estimation of the index size is enough in almost all the usecases where an virtual index is needed.