ctapobep / blog

My personal blog on IT topics in the form of GitHub issues.
6 stars 0 forks source link

Should we sort by ID when writing SQL? #15

Open ctapobep opened 2 years ago

ctapobep commented 2 years ago

Suppose somewhere on UI you have a list of objects that you want to be sorted by their creation order. The new ones are added to the end of the list. You're writing a SQL and you see that w/o any additional sorting the items are already returned in the right order. Can you rely on this? Will it always be the case? No, not in all situations, at least when using PostgreSQL.

NB: the article will mention different types of indexes, you may want to check out Clustered, non-clustered, covering indexes in databases.

Physical storage

By default records are actually sorted not by order of insertion, but by their physical location. There are 2 primary ways of storing tuples (records):

Regardless of the approach, the data is actually split into logical pages. Each page can contain 1 or more records.

Table scan

When doing a table scan, the database has to iterate over all the records. Therefore it'll load 1st page, iterate over records there, then go the 2nd page, etc. Thus if Clustered Index is used, then you'll get data in the order you wanted.

But if you use PG, then it's more complicated. First, when DB just started to exist, PG fills the pages one after another. The newly inserted rows will be appended to the end. So you'll see the same result as in Oracle.

At some point though things will change:

  1. When updating a tuple, PG creates a new version of it (as opposed to Oracle/MySQL - they use Undo Storage). Which means that the tuple may be inserted on one of the previous pages if those were vacated. Even though PG will try to insert it to the same page as part of the optimization called HOT (heap-only tuples), at some point it'll be forced to insert the updated version into some other page. So after updates it's likely that you'll keep seeing the object on the same position within the list, but then suddenly it'll jump into a different position.
  2. When inserting 2 subsequent tuples, the 2nd one could be inserted on a page with a smaller index. Again, you'll see them not in the intended order.

The most unpleasant part of it is that at the beginning it'll all work as expected. You'll start seeing bugs only once the DB grows.

Index scan

When doing a lookup using index, the order in which you'll get the results back will depend on the index itself. Note, that a single Index Key can reference multiple values (if the column isn't unique). In such situation in PG they get sorted in the same order the physical records are stored (I didn't dig into the source code, but the experimenting showed this behaviour).

In PG this makes some sense because its indexes point to physical location called CTID ((page num; index on page)). Though arguably this could've worked differently: index values could've been simply appended to the end - but that doesn't seem to be the case.

But indexes in Oracle/MySQL work differently - secondary index doesn't reference a physical location. It references the Clustered Index (which is Primary Index by default). So here again, I'd expect the index values to be sorted in the same order the tuples are stored physically. Meaning the records will be sorted as expected (didn't check this explicitly, but I've never run into issues).

Beware of collations

Note, that if we sort by ID and those IDs are strings, then we have to be extra careful. Collations determine how the strings are compared. So if ID generation says that ID "A" came before "a" (e.g. if you use UUIDv6 or similar for ID and encode it as letters), then so should order by.

PostgreSQL for instance by default uses OS-level collation at the moment of creating a DB. Which means the env variables like LC_ALL, LC_COLLATE, LC_CTYPE will matter. By default different Unix OS'es often use something like eu_US.utf8. Unfortunately there's no consistency at least between MacOS and Linux in how xxx.utf8 behaves. On my Mac a goes after A, but on my Ubuntu it's the opposite. Which means your app may have different ordering depending on the OS the database runs on.

To be consistent across all OS, you can use C/POSIX collations for string-based IDs. You can set it at DB-level at the time of creation. This might be too radical of a step (as for most columns you probably want UTF8 collations), but here's how it looks:

create database blah_db
  template template0
  encoding = 'utf8'
  -- Ensure we're NOT using ICU (which it may be by default depending on how initdb script was ran and maybe also
  -- depending some OS & defaults). Otherwise lc_collate will be ignored.
  locale_provider = 'libc'
  -- How should the text fields be compared during sorting. 'C' means they are compared by their code points - which
  -- is the simplest, but it does mean that the text isn't always sorted alphabetically. Since we use text-based
  -- IDs, it's important to keep ID-generation consistent with the DB collation - ID that was generated later
  -- must also be considered larger from the DB perspective. See https://github.com/ctapobep/blog/issues/15
  --
  -- For text fields where alphabetical sorting is required, specify collation explicitly when creating the column.
  lc_collate = 'C'
  -- Determines what is a letter vs number/whitespace as well as how to lowercase/uppercase letters.
  -- On some installation it should be 'UTF-8'
  lc_ctype = 'en_US.utf8'

Or better - you can set it for particular tables/columns.