user_data hard to parse

brave / nitriding-daemon

Tool kit for building secure, scalable, and networked services on top of AWS Nitro Enclaves.

Mozilla Public License 2.0

29 stars 10 forks source link

user_data hard to parse #23

Closed rillian closed 9 months ago

rillian commented 1 year ago

I didn't catch this in review, but the way we're storing hashes in the user_data section of the attestation document is tricky to parse. Although we prepend a hash key so we can update the digest algorithm and use a semicolon as a field separator, the hash itself is binary. That means one can't split on the (semi)colon characters to separate each hash and its key. Instead one must prefix-match on the keys and then use known hash lengths to split the fields.

Parsing would be more convenient if we hex-encoded the hash digest values so the punctuation can be unambiguous. This would reduce the number of (256-bit) hashes we could include from 25 to 14, both of whom seem like plenty.

Less convenient, we'd need to support both schemes in brave-core across a transition period.

rillian commented 1 year ago

What do you think, @DJAndries ?

DJAndries commented 1 year ago

Reasonable concern. Using a string to represent the user data field fails when there are zeroed bytes within one of the hashes, which will terminate the string prematurely. What you stated is correct; when we parse the field, we currently extract sub-arrays from the data using known prefix & hash lengths, rather than using string splitting/substrings, etc.

I suggest that we forgo using a string entirely, and just use a 64-byte bytearray to encapsulate both hashes. If we can assume that sha256 will be the digest for all fingerprints, then it seems safe enough. In addition, if we wanted to add additional hashes, we can just extend the length of the bytearray, and client applications can simply check if the bytearray length is big enough to extract the FP that they need.

rillian commented 1 year ago

If you prefer binary, I suggest the multihash encoding which still allows different algorithms and is self-documenting for length.

DJAndries commented 1 year ago

If you prefer binary, I suggest the multihash encoding

yeah, sgtm. good option if we want to start using other digests in the future