crate / crate

CrateDB is a distributed and scalable SQL database for storing and analyzing massive amounts of data in near real-time, even with complex queries. It is PostgreSQL-compatible, and based on Lucene.
https://cratedb.com/product
Apache License 2.0
4k stars 550 forks source link

Regression on correlated subqueries #16178

Open mkleen opened 1 week ago

mkleen commented 1 week ago

https://tools.cr8.net/grafana/d/-8PNA2vnz/cratedb-dev-cluster-benchmarks?orgId=1&var-spec_file=correlated_subqueries.toml&var-statement=SELECT%20COUNT%28%2A%29%20FROM%20uservisits%20u%20WHERE%20%22lCode%22%20LIKE%20%27%25-EN%27%20AND%20EXISTS%20%28SELECT%201%20FROM%20uservisits%20WHERE%20%22cCode%22%20%3D%20u.%22cCode%22%29&var-concurrency=1

First spike on 15.05. Maybe related commit merged on 14.05 https://github.com/crate/crate/commit/4fd6e4196b8d56aa0ec97003f3675be3422fb393

mkleen commented 5 days ago

The benchmark is unstable. The results differ a lot running the benchmark multiple times in the same setup on the same commit locally. I get means means from 7392.931 ± 167.756 , 8808.026 ± 525.908, 10439.539 ± 1335.296 for the same version doing multiple runs.

matriv commented 5 days ago

Thank you for looking into this! Maybe then you could check if we can eliminate (or improve) this devation between runs, maybe adding a limit to the subquery or so?