BLAKE3-team / BLAKE3-specs

The BLAKE3 paper: specifications, analysis, and design rationale
https://blake3.io
Other
163 stars 9 forks source link

Question: Algorithm length labeling #3

Closed stevespringett closed 4 years ago

stevespringett commented 4 years ago

Hello,

I'm working on a new release of the CycloneDX software bill of material specification and I'm including BLAKE2b and would like to include BLAKE3 as well.

This is a non-technical question about how the output length value affects the label in which end users apply to BLAKE3 when using it.

BLAKE2b-256: b2sum -l 256 returns 64 hex characters.

by contrast: b3sum -l 32 returns 64 hex characters.

Should this be referred to as BLAKE3-32? If not, what should it be labeled as?

Also, one additional question. What is the highest length you envision the 'real world' would use. My initial thought would be a length of 128 which would produce a 256 character hex string.

If this is the case, then would labeling these as:

make sense?

Sorry for the noob questions. Just now installed BLAKE3 so completely unaware of the specs or history and I'm nearing the release of CycloneDX v1.2 and I'd like to get the labeling correct.

oconnor663 commented 4 years ago

All very good questions! I'm actually surprised this hasn't come up before. To the extent possible, I'd recommend just calling it BLAKE3 and having the output be 32 bytes. There are some key differences between BLAKE2 and BLAKE3 that motivate that recommendation:

For those reasons, I think labeling functions like BLAKE3-64/BLAKE3-512 could be more harmful than helpful. Users might reasonably expect them to be independent from each other, as most* of the BLAKE2 algorithms and all of the SHA-family algorithms are. Users might also expect that larger numbers mean more security, when in practice they just mean more space. I think it would be best if there was only BLAKE3 and it was almost always 32 bytes of output.

* Again this "most" situation is why having more than one XOF mode is a big downside. I think it can be better to never have a feature, than to sometimes have it.

Also, one additional question. What is the highest length you envision the 'real world' would use. My initial thought would be a length of 128 which would produce a 256 character hex string.

I think essentially everyone using BLAKE3 as a "regular" hash function -- where you work with its output as a single value -- is going to want 32 bytes. The XOF is for somewhat more exotic uses. For example, if you wanted to stretch one random 32-byte key into 100 different keys (probably using the BLAKE3 derive_key function, but either way), you'd produce an output that's 3200 bytes long and then slice it up. The Ed25519 algorithm is kind of like this internally, in that it needs 64 bytes of hash output, which it splits into two different secrets. Or maybe you could use BLAKE3 as a sort of CSPRNG, with the input as a seed, and an arbitrary amount of output. But again, I don't think any of these use cases look much like "outputting a single value", and I think all the use cases that do look like that should use 32 bytes and just call it BLAKE3. Let me know if all that makes sense.

zookozcash commented 4 years ago

Great answers, Jack. I wholly agree.

On Wed, May 20, 2020 at 7:44 AM oconnor663 notifications@github.com wrote:

All very good questions! I'm actually surprised this hasn't come up before. To the extent possible, I'd recommend just calling it BLAKE3 and having the output be 32 bytes. There are some key differences between BLAKE2 and BLAKE3 that motivate that recommendation:

-

In BLAKE2 (setting aside BLAKE2X) the maximum output length of each function was also the length necessary to achieve maximum security. So for example, it could be reasonable to say that BLAKE2b-512 is more secure than BLAKE2b-256. (Whether anyone actually needs that level of security is a different question.) In contrast, BLAKE3 (like BLAKE2X) can have any output length. But above 32 bytes, adding additional bytes of output does not increase security, because security is ultimately capped by the internal state size. There is no security benefit to using 64-byte BLAKE3 over 32-byte BLAKE3.

In BLAKE2 (setting aside one of the modes of BLAKE2X) outputs of length X and outputs of length X+1 are independent of each other. For example, if you know the BLAKE2b-256 hash of my input, that doesn't tell you anything about the BLAKE2b-512 hash of the same input. This is not the case with BLAKE3. A BLAKE3 output of length X and an output of length X+1 differ only in that very last byte; the first is a prefix of the second. The main reason this is done is that extendable output functions (XOFs) need a mode where you don't know how many bytes you're going to need in advance, and it's simpler to have this be the only mode than to have multiple different XOF modes.

For those reasons, I think labeling functions like BLAKE3-64/BLAKE3-512 could be more harmful than helpful. Users might reasonably expect them to be independent from each other, as all of the BLAKE2 and SHA- algorithms are. Users might also expect that larger numbers mean more security, when in practice they just mean more space. I think it would be best if there was only BLAKE3 and it was essentially always 32 bytes of output.

Also, one additional question. What is the highest length you envision the 'real world' would use. My initial thought would be a length of 128 which would produce a 256 character hex string.

I think essentially everyone using BLAKE3 as a "regular" hash function are going to want 32 bytes. The XOF is for somewhat more exotic uses. For example, if you wanted to stretch one random 32-byte key into 100 different keys (probably using the derive_key mode of BLAKE3, but either way), you'd produce an output that's 3200 bytes long and then slice it up. The Ed25519 algorithm is kind of like this, in that it needs 64 bytes of hash output, which it splits into two different secrets. Or maybe you could use BLAKE3 as a sort of CSPRNG, with the input as a seed, and an arbitrary amount of output. But again, I don't think any of these use cases look much like "outputting a single value", and I think all the use cases that do look like that should use 32 bytes and just call it BLAKE3. Let me know if all that makes sense.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/BLAKE3-team/BLAKE3-specs/issues/3#issuecomment-631482493, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD5H2QE62YDTUW66FQGARD3RSPNDNANCNFSM4NDSVQWA .

stevespringett commented 4 years ago

Thank you very much for the clarification, explanation, and recommendation. Makes sense.

Outcome: CycloneDX v1.2 will simply support BLAKE3 without regard to length as it doesn't make sense to include it.