Open x4m opened 1 year ago
@x4m , thanks for your PR! I also read your thread in the pgsql-hackers
list.
A few thoughts:
pg_strong_random
layer instead? This is what other high quality cryptography libraries seem to do. That way, this would benefit every Postgres instance and application in the world. Talk about making the world greener!pg_hton64
is ultimately an alias for bswap64
, which will get compiled on supported architectures as a single instruction. Are there benchmarks that show that manual bit swapping is faster?All in all, I'm unlikely to add 1 or 3. If I benchmark the cache in 2 and it provides a large performance improvement, I may implement something similar. 4 seems reasonable if I can verify and test there aren't any off-by-one errors that would lead to buffer overflows.
Cheers.
I'm still undecided on implementing something similar to 2. After several benchmarks the performance gain is about 10% on a Postgres install with OpenSSL. This is pretty good, but I still feel like the functionality probably shouldn't live in this extension. I'm going to continue considering it, however.
@fboulnois Could you please publish Andrey Borodin's (@x4m) extension. Many people have been wating for his patch, but unfortunately the patch was recently suspended because of feature freeze for 17 version of PostgreSQL, just before the final approval of the new RFC by its authors.
@fboulnois Could you please publish Andrey Borodin's (@x4m) extension. Many people have been wating for his patch, but unfortunately the patch was recently suspended because of feature freeze for 17 version of PostgreSQL, just before the final approval of the new RFC by its authors.
At this point, the only substantive difference between x4m's patch and my extension is the alternate pseudorandom construction, which I've already discussed in https://github.com/fboulnois/pg_uuidv7/pull/15#issuecomment-1722562079 . Cheers!
I think @sergeyprokhorenko meant exactly this: let's add a counter or microseconds.
The thing is if you do
select i, gen_uuid_v7() id from generate_series(1,1000) i
the order of i
will be different from order of id
.
This is not super critical, but actually makes life better in many scenarios.
I meant to add a counter 18 bits long, initialized every millisecond with a random number. The most significant bit of the counter is initialized to zero to prevent the counter from overflowing if the most significant bits of the random number are ones. All advanced implementations use a counter:
This promotes sortability, additional monotonicity and better database performance. It can also make it easier to find bugs by knowing the latest version of the record. The name of the function with counter can be the common uuidv7().
Different projects use different counter lengths. Why use 18? I see other projects use 14, 16, 24, 40, 42, and 48 bits for the counter size.
Just a sane number: 125MHz without overflow, tradeoff between max frequency and predictability. Detailed explanation was in hackers AFAIR.
Different projects use different counter lengths. Why use 18? I see other projects use 14, 16, 24, 40, 42, and 48 bits for the counter size.
Too short a counter makes it possible for the counter to overflow under peak loads. This can be mitigated by using the timestamp as the counter in such a situation, but is intuitively undesirable. But a short counter theoretically allows for faster UUID detection if the UUID is compared over the timestamp + counter segment (this is not the case now) rather than over the entire UUID length.
With 18 bits, counter overflow is guaranteed to be impossible if the high-order bit of the counter is initialized to zero every millisecond, and if the remaining counter bits are initialized to a random number.
Too long a counter reduces the random segment of the UUID, and therefore worsen unguessability of the UUID by brute-force guessing. But a long counter theoretically allows for fewer resources to be spent on random number generation.
RFC 9562 recommends that the counter SHOULD be at least 12 bits but no longer than 42 bits.
UUIDv7 generation function in ClickHouse DBMS
https://clickhouse.com/docs/en/sql-reference/functions/uuid-functions#generateUUIDv7
https://github.com/ClickHouse/ClickHouse/pull/62852
For the sake of monotonicity, when several microservices access the UUIDv7 generator in parallel, a single UUIDv7 generator is used per server.
Just a sane number: 125MHz without overflow, tradeoff between max frequency and predictability. Detailed explanation was in hackers AFAIR.
See https://www.postgresql.org/message-id/84D20D0F-B5FF-41CD-9F48-E282CE9FEC1D%40yandex-team.ru
and https://www.postgresql.org/message-id/F91948DD-500A-4A22-ABB9-5F4C59C28851%40yandex-team.ru
The standard does not specify how big counter should be. PFA patch with 16 bit counter. Maybe it worth doing 18bit counter - it will save us one byte of PRNG data. Currently we only take 2 bits out of the whole random byte.
This code have several advantages:
The code is based on PG patch https://commitfest.postgresql.org/43/4388/ I have my repo https://github.com/x4m/pg_uuid_next , but I do not have energy to support it like you do. So I decided to bring some code from there to your implementation.
Have a nice day :)