cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.89k stars 3.77k forks source link

opt: don't choose a "less constrained" scan when row count estimate is low #130201

Open DrewKimball opened 1 week ago

DrewKimball commented 1 week ago

Each constraint in a constrained scan is costed using the "random IO" cost, which is a 4x multiplier. When the estimated row count of the scan is very low, this can cause the optimizer to choose a scan with less constraints (or even a full-table scan), even if it may cause many more rows to be scanned in the worst case. This has caused performance issues for customers, which we have resolved in the past by making changes to improve statistics.

However, statistics will never be 100% accurate. The current situation trades a small improvement in expected performance for a huge decrease in worst-case performance. We should consider making the cost model more robust to inaccurate stats by preferring scans that are "more constrained". We could balance this preference so that it takes over whenever the row count is sufficiently small (say, 10 rows, or 0.1% of the table etc.), and otherwise the optimizer's estimate is trusted.

Related to https://github.com/cockroachdb/cockroach/issues/79683.

Jira issue: CRDB-41922

nameisbhaskar commented 1 week ago

Adding the priority as P1 based on the impact. Please feel free to change it to the appropriate priority.

michae2 commented 1 week ago

[triage] dropping to P-3 since this is not a regression, and not a customer issue