cockroachdb / cockroach

CockroachDB - the open source, cloud-native distributed SQL database.
https://www.cockroachlabs.com
Other
29.52k stars 3.7k forks source link

opt: estimate cost of geospatial lookup joins #48222

Open rytaft opened 4 years ago

rytaft commented 4 years ago

As described in the Geospatial RFC, in order to accurately estimate the cost of a geospatial lookup join, we will need an estimate of how expensive the API calls to Covers, CoveredBy or Intersects are since they will need to be called on each input row. This will need to be added on to the cost of processing each row of the input when costing geospatial lookup joins.

This issue covers the changes that should be made to the cost model in order to accurately estimate the cost of performing a geospatial lookup join. #48219, #48220, and #48221 cover the work needed to get correct stats to inform the cost model. #48214 covers the work needed to estimate the cost of the geospatial builtin functions, which complements this issue since the relative cost between the geospatial lookup join and the full-table scan with builtin function execution will determine which plan is chosen.

Epic CRDB-16930

Jira issue: CRDB-4346

Jira issue: CRDB-13899

sumeerbhola commented 4 years ago

I'm unclear on how this differs from #48214 which talks about the Geospatial functions.

Was this meant to be the cost of executing set expressions using the inverted index (both scan and expression evaluation)?

rytaft commented 4 years ago

Yea, that's right. This issue is regarding the cost of executing these functions, which I thought would be important to accurately estimate the cost of performing the geospatial lookup join. #48214 is for estimating the cost of the builtin geospatial functions, which I assume will be more expensive than these api calls. I think we need both in order to make an informed decision about whether to use a geospatial index for a particular query.

But let us know if you think the cost of these apis is trivial enough that we can basically ignore it. My impression was that there was some non-trivial computation required to get the covering, etc.

sumeerbhola commented 4 years ago

My impression was that there was some non-trivial computation required to get the covering, etc.

That is my impression too. Is it possible that comparing the cost of the full table scan versus the limited inverted scan may be enough (at least to start with) in causing the optimizer to make the right decision?

Also, do we have an issue for computing the cost of the inverted index scan and the selectivity of using the inverted index, or is that subsumed in this one?

rytaft commented 4 years ago

Is it possible that comparing the cost of the full table scan versus the limited inverted scan may be enough (at least to start with) in causing the optimizer to make the right decision?

Yea, I think we'll just need to do some experiments to get the relative cost of scanning a table with and without executing a geo-spatial function, and with and without the inverted index scan/join.

I think this issue should probably cover the changes we need to make to the cost model for estimating the cost of geospatial lookup joins, but I also opened #48219, #48220, and #48221 for the work we'll need to do to get the right stats/selectivity. I'll update the title/description of this issue to make that more clear.

github-actions[bot] commented 6 months ago

We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!