cockroachdb / docs

CockroachDB user documentation
https://cockroachlabs.com/docs
Creative Commons Attribution 4.0 International
187 stars 453 forks source link

`LIMIT` should always be partnered with `ORDER BY` #6645

Closed thatnerd closed 3 years ago

thatnerd commented 4 years ago

Our docs include uses of LIMIT that aren't paired with ORDER BY. This is an anti-pattern, and shouldn't be present, but it seems to be widespread.

In cases that lack an ORDER BY, we don't guarantee ordering of results, even by primary key, and when using LIMIT, subsequent identical queries could return different results even in the absence of writes. While each leaseholder should return results ordered by the primary key, the gateway wouldn't necessarily run a merge sort on the results it gets from each node unless explicitly told to.

A really clear example is in our docs on pagination of results, where we recommend using a trick called the "seek method" to paginate results by using a combination of ORDER BY on an indexed field, plus a LIMIT clause, in order to avoid using the less efficient OFFSET clause, but neither example uses ORDER BY.

You can see that the example, SELECT * FROM employees WHERE emp_no > 10000 LIMIT 25; doesn't use ORDER BY at all.

A possible (and relatively simple) solution just for this section for would be to add ORDER BY emp_no to each query that uses this data set. Other queries in the page, such as SELECT id, name FROM accounts LIMIT 5, should ORDER BY the primary key if nothing else is available.

I looked in docs/v19.2 to find other potential cases of a LIMIT without an ORDER BY on the same line:

docs/v19.2/ $ grep -Ir LIMIT . | grep -v 'ORDER BY' | sed 's/:.*$//' | uniq | sort | sed 's/^\.\///'
build-a-java-app-with-cockroachdb-hibernate.md
build-a-java-app-with-cockroachdb.md
common-table-expressions.md
create-index.md
delete.md
import.md
learn-cockroachdb-sql.md
limit-offset.md
migrate-from-oracle.md
performance-best-practices-overview.md
performance-tuning-insecure.md
performance-tuning.md
query-order.md
select-clause.md
selection-queries.md
sql-faqs.md
sql-feature-support.md
subqueries.md
training/backup-and-restore.md
training/data-import.md
training/locality-and-replication-zones.md
update.md
window-functions.md
$ 

Each of these has at least one use of LIMIT without a corresponding ORDER BY in that line, though it might miss an ORDER BY in a previous line for multi-line SQL statements.

jseldess commented 4 years ago

Good find, @thatnerd!

@ericharmeling, @rmloveland, for consideration.

rmloveland commented 3 years ago

I think this somehow got fixed on the Selection queries page:

SELECT * FROM employees AS OF SYSTEM TIME '-1m' WHERE emp_no > 10000 ORDER BY emp_no LIMIT 25;

But I have not looked into the other pages mentioned.