Encode for hash revision

BasileiosKal commented 2 years ago

As members of the WG have noted (thanks @christianpaquin, @tmarkovski!!), the current EncodeForHash section is confusing and prone to error. Especially the part about encoding octet strings.

A possible solution is to remove entirely the part about appending octet strings with their length. The only place this is actually used is for the msg in MapMsgToScalarAsHash, for the header in the domain hash and the presentationHeader in the challenge hash. I believe we can handle those cases in their respected functions. Octet string will then be encoded directly.

Pros: will make EncodeForHash much simpler + no need for the Ciphersuite ID and Public Key exceptions.
Cons: Will add 1-2 additional step to each of the core operations.

IMO the tradeoff is worth it, but keen hear other approaches as well.

christianpaquin commented 2 years ago

I would lean on the other side, to prepend all inputs with their lengths (closer to what the text says). Encoding would therefore 1) explain how to transform various input types to octet strings, and 2) how to hash octet strings (len(x) || x). This would keep things simple while providing encoding security where needed (and no bad side effect where it's not).

christianpaquin commented 2 years ago

In the Sign function, the secret key must also be encoded directly in e_s_for_hash without pre-pending its length (like the public key in dom_for_hash) for the fixtures to pass. This should also be specified in the hash encoding specification.

tplooker commented 2 years ago

Also as raised by @christianpaquin cv_for_hash includes the disclosed index and it isn't stipulated whether the indicies starts from 0 or 1.

andrewwhitehead commented 1 year ago

Another option may be to avoid mixing fixed-length inputs and variable-length inputs by hashing the variable-length inputs beforehand (specifically ciphersuite_id, header, and ph) and only dealing with scalars, points, and integers in encode_for_hash.

Procedure:

1.  let octets_to_hash be an empty octet string.
2.  for el in input_array:
11.     if el is a Point in G1: el_octs = point_to_octets_g1(el)
12.     else if el is a Point in G2: el_octs = point_to_octets_g2(el)
10.     else if el is a Scalar: el_octs = I2OSP(el, octet_scalar_length)
11.     else if el is a non-negative integer: el_octs = I2OSP(el, 8)
12.     else: return INVALID
13.     octets_to_hash = octets_to_hash || el_octs
14. return octets_to_hash

Then in Sign, for example:

Definitions:
  - ciphersuite_hash, scalar. The result of hash_to_scalar(ciphersuite_id, 1) for the specific ciphersuite.

1.  header_hash = hash_to_scalar(header, 1)
2.  dom_array = (PK, L, Q_1, Q_2, H_1, ..., H_L, ciphersuite_hash, header_hash)
3.  dom_for_hash = encode_for_hash(dom_array)
4. ...

(and similarly in Verify, ProofGen, ProofVerify).

BasileiosKal commented 1 year ago

I like the simpler encode_for_hash. Keeping that the same, not sure why not call,

1. header_prime = I2OSP(len(header), 8) || header
2. dom_array = (PK, L, Q_1, Q_2, H_1, ..., H_L, ciphersuite_id, header_prime)

to avoid the extra hashes?? (does ciphersuite_id need to be appended with its length??)

Also IMO, appending everything with their length will not simplify things much more compared to this 2 proposals (will definitely simplify the current text though), other than removing step 1 from above (which i don't think is worth it??)

andrewwhitehead commented 1 year ago

Yes it's probably simpler to just avoid passing any byte arrays through encode_for_hash and prepend the length when necessary. How about (renaming encode_for_hash to serialize):

1. dom_for_hash = serialize((PK, Q_1, Q_2, L, H_1, ..., H_L))
2. if dom_for_hash is INVALID, return INVALID
3. header_prime = I2OSP(len(header), 8) || header
4. domain = hash_to_scalar(dom_for_hash || ciphersuite_id || header_prime, 1)

I also changed the position of L to be adjacent to the L repeated elements, this seems more natural and consistent with ProofGen. We could probably skip the check for INVALID because the types of all inputs are known.

Example from ProofGen:

18. c_array = (A', Abar, D, C1, C2, R, i1, ..., iR,
                       msg_i1, ..., msg_iR, domain)
19. c_for_hash = serialize(c_array)
20. if c_for_hash is INVALID, return INVALID
21. ph_prime = I2OSP(len(presentationHeader), 8) || presentationHeader
22. c = hash_to_scalar(c_for_hash || ph_prime, 1)

Incidentally SignatureToOctets could also be implemented in terms of serialize.

BasileiosKal commented 1 year ago

Yea, this seems a good option to me! The "put all the octet strings at the end" sounds like a reasonable and simple rule (as long as we mention it in the document).

+1 for renaming encode_for_hash to serialize and for updating SignatureToOctets (and ProofToOctets i guess) to use serialize.

We will need to have some checks though for INVALID arguments. For example the length of the header needs to be at most 2^8. We can add steps like that in the precomputations though, or as requirements on the inputs?? (in ProofGen/ProofVerify we will need 2 of those tests, for header and ph).

andrewwhitehead commented 1 year ago

2^8 is definitely not long enough for the header, 2^64 I think?

BasileiosKal commented 1 year ago

Yea, sorry my bad, 2^64 is correct!

decentralized-identity / bbs-signature

Encode for hash revision #225