Closed zenazn closed 1 year ago
Duplicate of #2041. As explained there moving the ORDER BY
below the HASH AGGREGATE
is not a sufficient solution in a multi-threaded system - that only works in a single-threaded execution model. Instead what you want to do is push the ORDER BY
into the FIRST
aggregates, so that they become e.g. FIRST(b ORDER BY a)
. If the order clause is simple enough the system could also translate it into a MAX_BY
/MIN_BY
to improve execution speed.
Fixed now in #6616
What happens?
When running a
SELECT DISTINCT ON ... ORDER BY ...
query, I expect DuckDB to return the first row—as defined by the query'sORDER BY
clause—for each unique DISTINCT ON. Instead, DuckDB picks a random row.Postgres docs for the behavior I expect.
To Reproduce
Extracted from a production use case:
Here's the result of running a few variants of that query in a duckdb 0.7.1 shell:
DISTINCT ON (t.a)
partEXPLAIN
of that queryFrom the query plan it's pretty clear the
ORDER_BY
block is on the wrong side of theHASH_GROUP_BY
blockOS:
OSX, M1
DuckDB Version:
0.7.1
DuckDB Client:
shell
Full Name:
Carl Jackson
Affiliation:
Watershed
Have you tried this on the latest
master
branch?Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?