Open Defman opened 2 years ago
Hello, I am Blathers. I am here to help you get the issue triaged.
Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here.
I was unable to automatically find someone to ping.
If we have not gotten back to your issue within a few business days, you can try the following:
:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.
Looks like the optimizer is unable to use the index because the radius
argument to ST_DWITHIN
is not a constant. Postgres is expanding a bounding box around one of the arguments by radius
units, then checking for intersections with the index column - we could probably modify the exploration rules to do something similar.
In the meantime, you could try a manual rewrite like this to make sure the index is used:
CREATE VIEW "local" AS
SELECT
"user"."id" as "user_id",
"candidate"."id" as "candidate_id"
FROM
"user",
"user" AS "candidate"
WHERE
ST_INTERSECTS(ST_EXPAND("user".location, "user".radius::float), "candidate".location)
AND ST_DWITHIN("user"."location", "candidate"."location", "user"."radius"::float)
AND "candidate"."id" != "user"."id"
;
The query plan looks better
distribution: full
vectorized: true
• group
│ estimated row count: 20,269
│ group by: id
│ ordered: +id
│
└── • lookup join
│ table: user@primary
│ equality: (id) = (id)
│ equality cols are key
│ pred: st_covers(st_buffer(location, radius::FLOAT8), location) AND st_dwithin(location, location, radius::FLOAT8)
│
└── • inverted join
│ table: user@user_location_idx
│ on: id != id
│
└── • scan
estimated row count: 20,278 (100% of the table; stats collected 11 hours ago)
table: user@primary
spans: FULL SCAN
However it is still many magnitudes slower than Postgres (5min terminated, where Postgres is only spending 3sec).
I would like to add that the test was done using "geometry" on Postgres and "geography" on Cockroach, this should not have any impact since its just changing the reference system.
Thanks for the report, @Defman!
I think we should be able to easily support a non-constant distance
argument in st_dwithin
as an inverted join predicate as long as the value comes from the input (i.e., not the inverted index). We need to explicitly add this logic in the optimizer and execution engine, but I don't think it should be too difficult.
I am curious, though, what is causing the above plan with @DrewKimball's rewrite rule to be so slow. @Defman if you are willing to share the results of EXPLAIN ANALYZE
on this query (both the original and the rewritten one), that would help us understand where we are spending the time. You could try using a smaller sample of the "user" table to make sure the CRDB run actually finishes.
Thanks for coming back so quickly @rytaft and @DrewKimball
Here is the updated view, query and resulting query plan.
As @rytaft suggested I tried adding a limit to the below view, the query now finishes but only on a fraction of the data. It is scaling linearly with the elements in the view. In this case the view contains n^2
elements, where n
is the size of the user
table. This results in the query time being O(n^2)
, in other words exponential by the number of users.
CREATE VIEW "local" AS
SELECT "user"."id" as "user_id",
"candidate"."id" as "candidate_id"
FROM "user",
"user" as "candidate"
WHERE ST_COVERS(ST_BUFFER("user".location, "user".radius::float), "candidate"."location")
AND ST_DISTANCE("user"."location", "candidate"."location") <= "user"."radius"::FLOAT
AND "user"."id" != "candidate"."id"
;
SELECT "user_id",
COUNT(1)
FROM
"local"
GROUP BY "user_id";
distribution: full
vectorized: true
• group
│ estimated row count: 20,269
│ group by: id
│ ordered: +id
│
└── • lookup join
│ table: user@primary
│ equality: (id) = (id)
│ equality cols are key
│ pred: st_covers(st_buffer(location, radius::FLOAT8), location) AND st_dwithin(location, location, radius::FLOAT8)
│
└── • inverted join
│ table: user@user_location_idx
│ on: id != id
│
└── • scan
estimated row count: 20,278 (100% of the table; stats collected 5 minutes ago)"
table: user@primary
spans: FULL SCAN
Hi @Defman -- it would help if you could print the output of EXPLAIN ANALYZE
, including the URL which allows us to see the DistSQL plan and time spent in each processor. I'm not sure we'll be able to fix your performance issues on this query in the near term, but it will be helpful for us to understand where to focus our efforts to improve this in coming releases. Thank you!
planning time: 18ms
execution time: 5.3s
distribution: full
vectorized: true
rows read from KV: 30,654 (5.3 MiB)
cumulative time spent in KV: 3.9s
maximum memory usage: 13 MiB
network usage: 224 B (3 messages)
• group
│ nodes: n2
│ actual row count: 2
│ estimated row count: 8,101
│ group by: id
│ ordered: +id
│
└── • limit
│ nodes: n2
│ actual row count: 10,000
│ KV time: 3s
│ KV contention time: 0µs
│ KV rows read: 10,120
│ KV bytes read: 100 KiB
│ estimated row count: 10,000
│ count: 10000
│
└── • lookup join
│ nodes: n2
│ actual row count: 10,000
│ KV time: 3s
│ KV contention time: 0µs
│ KV rows read: 10,120
│ KV bytes read: 100 KiB
│ table: user@primary
│ equality: (id) = (id)
│ equality cols are key
│ pred: st_covers(st_buffer(location, radius::FLOAT8), location) AND st_dwithin(location, location, radius::FLOAT8)
│
└── • inverted join
│ nodes: n2
│ actual row count: 10,239
│ KV time: 141ms
│ KV contention time: 0µs
│ KV rows read: 19,510
│ KV bytes read: 1.2 MiB
│ table: user@user_location_idx
│ on: id != id
│
└── • scan
nodes: n2
actual row count: 1,024
KV time: 789ms
KV contention time: 0µs
KV rows read: 1,024
KV bytes read: 4.0 MiB
estimated row count: 20,278 (100% of the table; stats collected 5 hours ago)
table: user@primary
spans: FULL SCAN
Here's the analyzed plan (local limited by 10000) as seen below
CREATE VIEW "local" AS
SELECT "user"."id" as "user_id",
"candidate"."id" as "candidate_id"
FROM "user",
"user" as "candidate"
WHERE ST_COVERS(ST_BUFFER("user".location, "user".radius::float), "candidate"."location")
AND ST_DISTANCE("user"."location", "candidate"."location") <= "user"."radius"::FLOAT
AND "user"."id" != "candidate"."id"
LIMIT 10000
;
EXPLAIN ANALYSE SELECT "user_id",
COUNT(1)
FROM
"local"
GROUP BY "user_id";
I'm not sure what you mean be URL.
@Defman you can run the query with EXPLAIN ANALYZE (DISTSQL)
to get a url showing a diagram with execution statistics.
planning time: 2ms
execution time: 5.4s
distribution: full
vectorized: true
rows read from KV: 30,654 (5.3 MiB)
cumulative time spent in KV: 4s
maximum memory usage: 13 MiB
network usage: 224 B (3 messages)
• group
│ nodes: n2
│ actual row count: 2
│ estimated row count: 8,101
│ group by: id
│ ordered: +id
│
└── • limit
│ nodes: n2
│ actual row count: 10,000
│ KV time: 3.1s
│ KV contention time: 0µs
│ KV rows read: 10,120
│ KV bytes read: 100 KiB
│ estimated row count: 10,000
│ count: 10000
│
└── • lookup join
│ nodes: n2
│ actual row count: 10,000
│ KV time: 3.1s
│ KV contention time: 0µs
│ KV rows read: 10,120
│ KV bytes read: 100 KiB
│ table: user@primary
│ equality: (id) = (id)
│ equality cols are key
│ pred: st_covers(st_buffer(location, radius::FLOAT8), location) AND st_dwithin(location, location, radius::FLOAT8)
│
└── • inverted join
│ nodes: n2
│ actual row count: 10,239
│ KV time: 140ms
│ KV contention time: 0µs
│ KV rows read: 19,510
│ KV bytes read: 1.2 MiB
│ table: user@user_location_idx
│ on: id != id
│
└── • scan
nodes: n2
actual row count: 1,024
KV time: 782ms
KV contention time: 0µs
KV rows read: 1,024
KV bytes read: 4.0 MiB
estimated row count: 20,278 (100% of the table; stats collected 7 hours ago)
table: user@primary
spans: FULL SCAN
Diagram: https://cockroachdb.github.io/distsqlplan/decode.html#eJyslW9v2zYQxt_vU3D3ygZYR3_spBYwQGmXDllcqXCcDsVgGIx0cblIpEpSibMgH2tfYJ9sIGW5dhI7MdZXBo98yLvnd2fdg_5WQATnJ6OT9xNSa1QznlOSyVqYjt8lH8bpR1LIjBXkt3F68Ym8-9KeAgpC5piwEjVEf4IPFAKYUqiUzFBrqWz43h06zRcQeRS4qGpjw1MKmVQI0T0YbgqECBL5RlYHfaCQo2G8cJfiArPacCmI4SVG5OjffzRQuGQm-4qayNpUtYmIfVvJ2--BAKYPFJrV8j1t2BwhGjzQtZz83TlN2GWBY2Q5qgNvM7NKySteYFwpXjJ1BxTOKyZ0RN4AhdTmEPs0DmlsKzr73Ob_Nih1E8ikMCjWavOa2s4-E1eKQpZHxKdesLzh8s5gG-73PPKRvwMKJVuQEkup7ggrLCmDVtZuv-hU88A2t_x93DoVN6gM5r9LLlAd-NsMc78zlyuXYsbzBdCV-GRRKaLNLJM3qHRHm9llfXWFqhMHlMRhFH0YpceTt11K4sOutTohsU9-_oU4o1fGByvvXcbOUmuLt4pow4riKZYm4Pe9PTkN6cD3noLye8GSxONWDgZD98TzAIeHHjlbAdbfCmKwrEjO9TWpNZvbPMizgIdPCHs0CIdbEQf7ILZol_MQvDQPIymv64r8JbkgUkRLQMkr4R51yXHyqz2c33LzlYvmzNHGue_E7Xu85MYiXoPcYqd-8Dx5fwN72PP3o95e_Ji619J7TL3fC7ZDD38kc2vDNubhPsyP53OFc2akOgg3mTvbU5Wjstm71XHyZZakk1lyMRp1Yt8Sep9eJJPZOP3jvNN9bg4O_f_9p97fqOeFD80YdSWFxo1att3sPUwpYD7H5mOmZa0y_KRk5p5plqnTuUCO2jS7g2ZxKpotm-C62N8pDnaLg53icLc43Cnu7xb3d4q9TbEz0fkJAs2tVNekYAZFdheR5v-2Dd8ybpbtMOiF2pHRqDgr-N9srVcGy1ZpdctRzJDf2A4M1rbacVztBX03O-1-idqO1PqR8LVNOH346b8AAAD__19J9zU=
Note: profile <=> user
Thank you, @Defman! Interestingly, the lookup join is much slower than either the scan or the inverted join. We are doing some work in this release cycle to improve lookup join performance, so I hope this will improve in 22.2.
We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!
Describe the problem
Spatial
queries
are orders of magnitude slower compared to the same table and data in Postgres.To Reproduce
The slow query on Cockroach and decently fast on Postgres
I have tried with precomputed
area
defined asGENERATED ALWAYS AS ( ST_BUFFER(location, radius::FLOAT, 50) ) STORED
.I have tried using
GEOMETRY
andGEOGRAPHY
, just in case.Expected behavior Postgres finishes in ~3sec and CockroachDB never finished terminated after 10min.
I expect Cockroach to be on par or at most 1 magnitude slower.
Environment: cockroach version details: Build Tag: v21.2.10 Build Time: 2022/05/02 17:41:51 Distribution: CCL Platform: darwin amd64 (x86_64-apple-darwin19) Go Version: go1.16.6 C Compiler: Clang 10.0.0 Build Commit ID: 39511c6a4d0bbb580b7ff6f5c0e1e77664474efb Build Type: release
Additional info Postgres
CockroachDB
Jira issue: CRDB-16027