Make placeholder text optional

KDean-GS1 commented 2 years ago

The use of placeholder text seems redundant. There's no operational difference between calculating the hash of the document with or without the placeholder text (except that the hash will be different). It would be simpler for the generator and validator to work with an empty string, i.e., calculate the hash from:

{
    "said": "",
    "first": "Sue",
    "last": "Smith",
    "role": "Founder",
}

instead of from:

{
    "said": "############################################",
    "first": "Sue",
    "last": "Smith",
    "role": "Founder",
}

It would be easier for both the generator and the validator to work with an empty string.

SmithSamuelM commented 2 years ago

It does make a difference in length which may mess up the version string which includes the length. The version string when provided allows parsing of multiple serialization types. So although from a pure digest perspective it doesn't matter. It does matter from a length perspective. Length malleabillity is also a potential attack so as a conservative safer approach we keep the length unchanged and hence use the dummy characters.

Not keeping the length constant would impose an order of processing constraint on the version field when it appears in a block that has a SAID. This leaks the SAID dependencies to other fields. So as a conservative measure that ensures that no matter in what order the SAID field is set to its value, its safe to initialize with dummy characters of the length of the type of digest and allows digests to always be computed on the set length given in the version string.

To clarify, the SAID digest must be computed after the version string is fixed because the version string field is part of the map that the digest is computed upon. This means that the length must be known. So the easy way to compute length is to serialize the map and then compute the size of the serialization and then inject that into the fixed length version string. But one can’t compute the length of the serialization without knowing the length of the SAID. So the easy way to do that is to put dummy characters into the SAID string, pass that off to the version string computation which puts a default version string with dummy value for length, serializes, then replaces the dummys with the actual length then passes it back for the SAID computation to compute the SAID with a correct version field. Not using the dummys means that now the version field calculation is no longer seperable from the SAID calculation and cascades the dependency.

It gets even more complicated for binary serializations such as CBOR and MGPK. Both of these use different framing codes depending on the length of the string serialized. So simply fudging the length by adding the expected length of the SAID will not guarantee an identical serialization after replacing the empty string with the SAID itself. But using a string of dummy characters of the same length will force the binary serialization to use the same framing code for both the dummy version and the actual version.

Although on the surface, it may seem simpler to use an empty string versus a dummy string, but in actuality (based on experience) it would be much more complex.

To be clear, we don't have a specific exploit for Blake3 Sha3 at present but its just being sensitive to potential. Frankly the code to add the dummy characters is pretty simple So its not enough easier to be worth the risk of a future length malleability attack or to have to worry about the order in which the version string is populated vis a vis the digest field. This is a common problem when performing cypto on a given serialization especially in MTU constrained protocols like UDP. One needs to calculate lengths to know how to fragment packets. So being able to prepopulate the digest field with dummy characters ensures the length is stable and any fragment header lengths can get computed. The order of operation of inserting the crypto digest is then free. Its just one of those things that trips one up enough to not be worth the risk of the minor optimization of having the digest computed on a different length than the final length.

In KERI messages we have the special case where two fields (both the i field and d field may be calculated as the same digest SAID of the inception event. So there is more opportunity for a length malleability type attack. Prepopulating with dummy characters guarantees length is stable.

SmithSamuelM commented 2 years ago

See length extension attack on sha1 for example.

https://en.wikipedia.org/wiki/Length_extension_attack

SmithSamuelM commented 2 years ago

When used a the source material for cryptographic primitives, a serialized map is more like a fixed binary structure than it is like a map. The order is fixed and the sizes are fixed. This allows us to treat it for parsing, pipelining, crypto, operations more like a fixed binary structure than a flexible map. Having a size of a value that is not fixed throughout the processing chain, is an aberration that violates the fixed size assumption. It's the sort of detail that can be overlooked and then cause problems later. Likewise the SAID protocol is meant to be more generic than the applications we have in front of us. A fixed size assumption is a more conservative assumption.

To elaborate: It is assumed that the serialization size used in computing the SAID is exactly the same size as the serialization size with the SAID inserted. Any operations on the serialization are safe to assume a constant serialization size. The SAID digest operation is one of those operations. A version string with serialization size operation is another.

KDean-GS1 commented 2 years ago

Discussed and agreed.

WebOfTrust / ietf-said

Make placeholder text optional #23