Discussion: Why not use integers?

InteXX commented 3 months ago

Pardon me for the appearance that I might be trolling, but I promise I'm not. I'm legitimately interested in answers to this.

I've been using integer primary keys for decades, and in all these years I haven't run into a single problem. I'm not sure what the fuss is all about. Enlighten me.

I'm a staunch adherent to the rule that primary/foreign keys are to be used for internal database referential integrity only. They should never be exposed outside the database—not in source code, heaven forbid in a URL or some such other place in the UI/UX.

So if that's the only reason to not use integers—obfuscation in the UI—I'm at a loss to see why I should switch.

Change My Mind™

neuecc commented 3 months ago

This format (or something very close to it) has been proposed and approved in an RFC as UUID v7. See the discussion on UUID v7 to understand the need. UUID v7 will also be added to .NET 7. Reading the text about it will also help you understand. https://github.com/dotnet/runtime/issues/103658

InteXX commented 3 months ago

Thank you for the link, but I'm not finding any discussion of the pros and cons of integer primary keys there—only discussion surrounding UUID v7 implementation itself.

In fact, the only times the word integer is used in reference to primary key usage are in envy of one of its benefits:

this method of generation ensures monotonically increasing values, just like an integer counter

a truly monotonic increasing sequence, which behaves like an auto-incrementing integer counter

You also suggest:

See the discussion on UUID v7 to understand the need

Which discussion would that be? I'm certain there are dozens or more, scattered in disparate collections across the landscape.

Please advise.

neuecc commented 3 months ago

You can find them by searching the Internet.

InteXX commented 3 months ago

You can find them by searching the Internet

Well, of course I know that, Yoshifumi.

If that's the best you have to offer, I'd say your discussion skills could use some improvement.

Timovzl commented 3 months ago

@InteXX Say you're practicing Domain-Driven Design (DDD) and persisting your domain entities directly to the database. Say you also do not want to expose indications of the number of entities in your database to UIs and APIs. If you're using integer keys, you now need secondary keys.

But wait... One is the entity's ID, and the other is its... other ID? This is where your domain model starts to get soiled with technical concerns, or you need to work around the issue in other ways.

What if an ID type was available that gave you everything in one? If the cost of using it (e.g. database shape, timestamp exposure) is less than the alternatives, there you go.

InteXX commented 3 months ago

@Timovzl

Say you also do not want to expose indications of the number of entities in your database to UIs and APIs

Yes, I like that. It conforms with my aforementioned rule: surrogate keys are for database referential integrity only. They should never escape into general use in the source code.

If you're using integer keys, you now need secondary keys

Actually, I've been doing that for some time now, although I refer to them as candidate keys. In all of my entities I add a GUID column named Key, to be populated with a value for use in application logic, APIs, URLs, etc.

But wait... One is the entity's ID, and the other is its... other ID? This is where your domain model starts to get soiled

Should we really think of it as "soiled?" True, with an ORM such as Entity Framework, we must include the primary key column in each entity. But if we put that into a base class it becomes really easy to ignore (which is correct IMO—we should ignore it). In fact, I tend to also put the candidate Key column in the base class for consistency across the application.

Given these, I would posit that a more applicable term might be "cleaned." I know, I'm stepping outside the bounds of DDD terminology and concepts here, but considering the circumstances I believe it to be a good trade. Here it seems the DDD rule conflicts with the referential integrity isolation rule, so we must choose our poison.

What if an ID type was available that gave you everything in one?

But again, that breaks the isolation rule.

In the final play, then, we end up with two identifier columns in the entity—surrogate for the database (primary key) and candidate for the application (user key).

I guess I've just made a case for Ulid 😉

Timovzl commented 3 months ago

@InteXX You make a strong counterargument to my reasoning about soiling the entity. 😃 I concede that that could make secondary external ID acceptable in entities.

On top of everything discussed, I'd add the following considerations:

A ULID or equivalent is code-generated, so with that as your primary key, you can reference by ID regardless of whether the entity being referenced was saved yet. This is a huge advantage to me. Also, needing to read back the auto-increment ID tends to interfere with batch insertion (which is one great way to scale traditional databases). I despise being forced to save changes halfway through just to obtain an auto-increment ID. I also hate how that mutates the in-memory entity. Too much mental strain caused by additional possible states.
If the ID generator is appropriately monotonic and the output type has a sufficiently small and convenient type, then auto-increment has very little benefit to offer by comparison. For example, DistributedId achieves this by being a 28-digit decimal like 1088824355131185736905670087 (requiring 13 bytes in SQL), or optionally a 16-char alphanumeric string like 3zfAkCP7ZtzfeQYp (requiring 16 bytes in SQL).

A word of caution: Almost all ULID-like implementations, ULID included, fail to maintain either monotonicity or unpredictability when reusing the same timestamp (millisecond). In the case of ULID, you lose the unpredictability, as it will simply increment by 1. DistributedId and DistributedId128 do not have this drawback.

InteXX commented 3 months ago

that could make secondary external ID acceptable in entities

Well... secondary, yes. External? I guess it depends on your perspective 😉

A ULID or equivalent is code-generated, so with that as your primary key, you can reference by ID regardless of whether the entity being referenced was saved yet. This is a huge advantage to me.

Yes, I would agree. I like that as well. But now we're getting into personal preferences (which is no less important, of course, but it's a different room).

Too much mental strain

Goodness... we can't have that now, can we? 😮

In the case of ULID, you lose the unpredictability, as it will simply increment by 1

I can live with that.

I guess I've gone and sold myself (with your assistance) on Ulid for my candidates 👍

Cysharp / Ulid

Discussion: Why not use integers? #78