biotorrents / gazelle

BioTorrents.de’s version of Gazelle
https://torrents.bio
ISC License
19 stars 3 forks source link

All integer primary keys should be a bigint and most database tables should have a UUID v7 unique key #66

Closed pjc09h closed 1 year ago

pjc09h commented 1 year ago

Branching off the work in creatorObjects to position the database for scale. I've been meaning to implement some kind of basic sharding and replication since the beginning, which relies on not having key collisions. UUID v7 stored as binary(16) as a unique key, while maintaining the standard auto-increment id bigint columns, seems to be the way to go.

The database class is already set up to transparently handle UUID binary to string conversion so, e.g., select uuid, name from creators order by created desc limit 10 will return UUID's in the form of 01877b4a-b27c-70db-9522-149e9a40ef59.

UUID documentation: https://uuid.ramsey.dev/en/stable/rfc4122/version7.html https://uuid.ramsey.dev/en/stable/database.html

Sharding documentation: https://aws.amazon.com/what-is/database-sharding/ https://www.linode.com/docs/guides/sharded-database/

Misc documentation: https://emmer.dev/blog/why-you-should-use-uuids-for-your-primary-keys/ https://itnext.io/laravel-the-mysterious-ordered-uuid-29e7500b4f8 https://stackoverflow.com/questions/52414414/best-practices-on-primary-key-auto-increment-and-uuid-in-sql-databases https://tomharrisonjr.com/uuid-or-guid-as-primary-keys-be-careful-7b2aa3dcb439 https://vladmihalcea.com/uuid-database-primary-key/ https://www.mysqltutorial.org/mysql-uuid/ https://www.percona.com/blog/store-uuid-optimized-way/

pjc09h commented 1 year ago

Actually, all the (few) binary columns should be transparently converted to strings with a helper function. torrents.info_hash and maybe two or three others.

pjc09h commented 1 year ago

The real utility of UUID keys became apparent when starting to add OpenAI output to various non-torrent content. It required me to add requestId, collageId, etc., to the OpenAI table. That sucks. It should just be contentId with a binary UUID that's somewhat able to be sorted according to the v7 spec. They won't be primary keys but they will be unique.