Generating UIDs - Githubissues

jemrobinson commented 8 months ago

Many OIDC backends will not have a sensible field that can be used as a UID out-of-the-box.

For instance, Microsoft Entra has objectId which is a GUID. This can be represented as a 128-bit integer, and possibly some 32-bit subset of this could be used as a UID, but there are no guarantees that (a) this will be unique across N users and (b) that this will always produce UIDs in a "safe" range of 1000+.

Convert Entra ObjectID to SID

```python def _object_id_to_sid(self, object_id: str) -> str: """ Convert a Microsoft Entra ObjectID into an SID """ # We must use little-endian order as described here # https://learn.microsoft.com/en-us/dotnet/api/system.guid.tobytearray?view=net-8.0 uuid_bytes = uuid.UUID(object_id).bytes_le uuid_integers = [int.from_bytes(uuid_bytes[idx:idx+4], byteorder="little") for idx in range(0, 16, 4)] return "S-1-12-1-" + "-".join(map(str, uuid_integers)) ```

One possibility would be to maintain a separate database of unique ID <=> UID mapping. There may be other possibilities too.

JimMadge commented 8 months ago

Feels like a balance between,

Simple, easy (?), not guaranteed to avoid duplicate/invalid user IDs
Extra effort, more rigorous

If an extra database is added, I think there is also the question of does it live in this project, or is it an exercise for the user to provide this. I think I'm leaning towards going for the option which we know will work rather than going with an option that we know can have problems that would be difficult to fix and hard to understand (however rare that is).

I'm reminded of the borg backup FAQs.

jemrobinson commented 8 months ago

Tested three options:

:x: Use a PostgreSQL database (too much overhead)
:x: Use another attribute from the remote object (does not work for Entra Groups as they do not have unused writable attributes)
✅ Use a Redis cache (lightweight but needs thought put into persistency)

alan-turing-institute / apricot

Generating UIDs #12