WICG / uuid

UUID V4
Other
63 stars 10 forks source link

Prose of "generating a random UUID" could be simplified #29

Closed marcoscaceres closed 2 years ago

marcoscaceres commented 3 years ago

For "Generating a random UUID", I wonder if we can rewrite the section to either:

domenic commented 3 years ago

I find the section as it is much clearer than the RFC or your proposed revision. Could you outline what's wrong with it (besides "clunky") before suggesting changes?

marcoscaceres commented 3 years ago

By clunky I mean the amount of redundancy here:

Screen Shot 2021-08-21 at 10 39 40 am

Do we really need to say "hexadecimal representation" ~17 times like the above? It's not clear which part is which (why I proposed naming each path clearly) - and it's super hard to read.

It's also clunky to sprinkle "-" throughout the [=list=], when [=string/concatenate=] supports joining with a separator "-".

marcoscaceres commented 3 years ago

@domenic, how about we turn "hexadecimal representation" into an algorithm either the array, a start index and an end index? That way we end up with must more readable:

  1. Let |whatever:string| be [=hexadecimal representation=] of |array| from 0 to 3.
  2. Let |other-part:string| be [=hexadecimal representation=] of |array| from 4 to 5.
  3. ...
  4. Let |other-part:string| be [=hexadecimal representation=] of |array| from 10 to 15.
  5. Let |parts| be [=list=] « whatever, other-part... ».
  6. Let |uuid:string| be [=string/Concatenate=] |parts| using separator "-".
  7. Return |uuid|.
domenic commented 3 years ago

That seems pretty weird to me. The hexadecimal representation of a number is a sensible thing to talk about; the hexadecimal representation of an array doesn't make any sense to me.

I think being explicit is better than being concise here.

bcoe commented 3 years ago

Perhaps:

Let |node:string| be [=hexadecimal representation=] of each |array| element from 10 to 15.
domenic commented 3 years ago

Hmm. Maybe something more like a map operation? "Let endHexes be the list formed by taking the hexadecimal representation of items 10 through 15, inclusive, in array"? Then "Let end be the result of concatenating endHexes"?

I still think that's worse than the current spec, because again, I think being explicit is really nice and simple and matches the code. It makes it clear that the structure is 4 hex representations, a dash, 2 hex representations, a dash, ...

But if the goal is to minimize the number of times you say "hexadecimal representation", then maybe something like the above would be the way to go.

We could also wait to see if anyone implementing the spec is confused by the current wording before changing anything.

marcoscaceres commented 3 years ago

I still think that's worse than the current spec, because again, I think being explicit is really nice and simple and matches the code.

By code I guess you mean WTF/UUID.cpp's createCanonicalUUIDString().

But if the goal is to minimize the number of times you say "hexadecimal representation", then maybe something like the above would be the way to go.

The goal is to be explicit, identify the parts, and to increase legibility by reducing redundancy in the text.

We could also wait to see if anyone implementing the spec is confused by the current wording before changing anything.

Because UUID generation has been in browsers for a really long time, and is fairly well-understood, it's extremely unlikely anyone is going to implement directly from what is in the spec text (it's the other way around, I think: the spec matches some implementation, which seems to be WTF/UUID). Thus, the spec text is mostly for us spec folks (and for developers) to have some clarity to what the parts are - which is the semantic aspects of UUIDs. This is also why I suggested initially we just defer to RFC4122.

Consider, aside from the WebIDL, the entire WebKit and Chromium implementations are 4 lines:

#include <wtf/UUID.h>

String Crypto::randomUUID() const
{
 return createCanonicalUUIDString();
};

With createCanonicalUUIDString() predating this effort. What's nice about createCanonicalUUIDString() is how it uses hex() as a function (which matches what is being suggested above):

    // Format as Version 4 UUID.
    return makeString(
        hex(randomData[0], 8, Lowercase),
        '-',
        hex(randomData[1] >> 16, 4, Lowercase),
        "-4",
        hex(randomData[1] & 0x00000fff, 3, Lowercase),
        '-',
        hex((randomData[2] >> 30) | 0x8, 1, Lowercase),
        hex((randomData[2] >> 16) & 0x00000fff, 3, Lowercase),
        '-',
        hex(randomData[2] & 0x0000ffff, 4, Lowercase),
        hex(randomData[3], 8, Lowercase)
    );

In the case of Mozilla, the new UUID Generator they are building looks like the following. The effort is being done independently of exposing this method:

NS_IMETHODIMP
nsUUIDGenerator::GenerateUUID(nsID** aRet) {
  nsID* id = static_cast<nsID*>(moz_xmalloc(sizeof(nsID)));

  nsresult rv = GenerateUUIDInPlace(id);
  if (NS_FAILED(rv)) {
    free(id);
    return rv;
  }

  *aRet = id;
  return rv;
}

nsUUIDGenerator::GenerateUUIDInPlace(nsID* aId) {
  // Fill the nsID with 16 random bytes.
  uint64_t randomNumbers[2]{RandomUint64OrDie(), RandomUint64OrDie()};
  static_assert(sizeof(nsID) == sizeof(randomNumbers),
                "nsID must be exactly 16 bytes");
  memcpy(aId, &randomNumbers, sizeof(nsID));

  // Put in the version
  aId->m2 &= 0x0fff;
  aId->m2 |= 0x4000;

  // Put in the variant
  aId->m3[0] &= 0x3f;
  aId->m3[0] |= 0x80;

  return NS_OK;
}

Note the " // Put in the version" and "// Put in the variant", which again goes to my point about labelling the parts as variables. Note also that nsID is a four part struct specifically to match the UUID format.

My point being, there are multiple ways of doing this that end up with something pretty clear.

ctavan commented 3 years ago

I raised my concerns with using UUID "semantics" (i.e. time-{low,mid,high} etc.) in this comment and I still think it doesn't make sense to keep carrying around this historic notation for anything describing non-v1-UUIDs. In my opinion it's an unfortunate historic artifact that RFC4122 continued to use these semantics when introducing random (v4) and name-based (v3/v5) UUIDs.

marcoscaceres commented 3 years ago

I think it’s fine if we drop the actual semantic, but having a cleaner “Let |part1| … Let |part2|…, Let |part3|, let |part4|“ (or similar would) would really help with legibility.

See the current pull request. It’s a good improvement IMO.

bcoe commented 3 years ago

@marcoscaceres @ctavan @domenic I'm okay of trying to trim down the verbosity of the algorithm in spec text, but I don't think we've quite landed on format that pleases everyone. I'd like to close #33 for now, to unblock #34, and revisit this issue later.

Sound reasonable?

marcoscaceres commented 3 years ago

Sounds ok to me... but yeah, let's please come back to it. The WebCrypto spec has a bad rep for being a bit developer unfriendly and unapproachable (it's a difficult topic! not just the spec's fault) - so, speaking as a developer, someone on a devrel team, and an implementer, I think little editorial things like this really do help.

bcoe commented 3 years ago

@ctavan any interest on giving this rewrite a go, you're the subject matter expert at real world implementations of the algorithm 😄

bcoe commented 2 years ago

@twiss are you happy with where we landed on, with the algorithm spec text? I'd like to close this issue if we're in a good place.

twiss commented 2 years ago

I'm personally not opposed to the current text. It would be nice to normatively refer to an RFC, but RFC4122 is quite difficult to follow since it also defines other versions, not just v4. If it makes people happier, I think one alternative would be to move the reference to RFC4122 out of the note, and move the current algorithm into a note (i.e. swap which one is normative). That being said, I'm not sure if that would make it easier to read, so I'm also fine with leaving it as is.

For what it's worth, the Web Crypto spec says that

Conformance requirements phrased as algorithms or specific steps may be implemented in any manner, so long as the end result is equivalent. (In particular, the algorithms defined in this specification are intended to be easy to follow, and not intended to be performant.)

so once it's merged there, an implementation would also be allowed to generate a v4 UUID in any other way, without necessarily having to follow the exact steps written here.

bcoe commented 2 years ago

I'm personally not opposed to the current text.

👍 I'm going to go ahead and close this ticket then, we can iterate in the upstream spec in the future if we like.

Conformance requirements phrased as algorithms or specific steps may be implemented in any manner

Thanks for pointing out this caveat. In practice it seems like most folks implement UUID v4 with various optimizations that would look pretty ugly in spec text (buffers of random bytes, bit shifting tricks).