Problem in ProofVerify - Githubissues

rolfhaenni commented 1 year ago

In the current version of ProofVerify, the following problem may arise:

If the given parameter L and the number of disclosed messages R=length(disclosed_indexes) are used for computing U=L-R, then this U may be different from the length the vector (m^_j1,...,m^_jU) included in the proof. Without taking further actions, computing the sum-of-products H_j1 m^_j1 + ... + H_jU m^_jU in Line 12 may then fail in an actual implementation, for example with an IndexOutOfBoundException.

What is missing is an additional precondition R + length(m^_j1,...,m^_jU) = L.

Even a better solution is removing parameter L from the algorithm's parameter list and computing R=length(disclosed_indexes), U=length(m^_j1,...,m^_jU), and L=R+U. This avoids the additional precondition and simplifies the interface.

tmarkovski commented 1 year ago

Good point, the L argument seems redundant, since it can be calculated from the submitted proof and messages input. However, this slightly complicates the logic for extracting the commitments from the proof, since their total number is unknown. Implementers would have to ensure that the total number of commitments in m^_jX is the left-over octet length / scalar length. Not a huge deal, but it may complicate the operation definition a little.

This means the integrity of the API method definition is dependent on the encoded data provided inside one of it's arguments. I do tend to favor api definition simplicity, but I'm unsure about this specific case. Adding the additional precondition seems the most straightforward fix.

rolfhaenni commented 1 year ago

I think the tricky part is getting the order of the statements into the right order. What could possibly help is separating deserialisation of the inputs from the main procedure. With this in mind, my proposal without having L as a (redundant) parameter is shown below. I think that would be pretty clean, except for the fact that:

if Precondition 1 fails, then the length of (msg_i1, ..., msg_iR) in Line 7 would be different from R
if Precondition 2 fails, then the length of (j1, ..., jU) in Line 6 would be different from U

For consistency reasons, separating deserialisation must then be implemented everywhere, i.e. in Verify and ProofGen.

What do you think?

3.4.4. ProofVerify
******************

result = ProofVerify(PK, proof, header, ph, disclosed_messages, disclosed_indexes)

Inputs:
- PK (REQUIRED), an octet string of the form outputted by the SkToPk operation.
- proof (REQUIRED), an octet string of the form outputted by the ProofGen operation.
- header (OPTIONAL), an optional octet string containing context and application specific information. If not supplied, it defaults to an empty string.
- ph (OPTIONAL), octet string containing the presentation header. If not supplied, it defaults to an empty string.
- disclosed_messages (OPTIONAL), a vector of scalars. If not supplied, it defaults to the empty array "()".
- disclosed_indexes (OPTIONAL), vector of unsigned integers in ascending order. Indexes of disclosed messages. If not supplied, it defaults to the empty array "()".

Parameters:
- ciphersuite_id, ASCII string. The unique ID of the ciphersuite. 
- P1, fixed point of G1, defined by the ciphersuite.

Definitions:
- R, is the non-negative integer representing the number of disclosed (revealed) messages
- U, is the non-negative integer representing the number of undisclosed (unrevealed) messages
- L, is the non-negative integer representing the number of signed messages, i.e. L = R+U

Outputs:
- result, either VALID or INVALID. 

Deserialization:
1. proof_result = octets_to_proof(proof)
2. if proof_result is INVALID, return INVALID
3. W = octets_to_pubkey(PK)
4. if W is INVALID, return INVALID

Precomputations:
1. (A', Abar, D, c, e^, r2^, r3^, s^, commitments) = proof_result 
2. R = length(disclosed_indexes)
3. U = length(commitments)
4. L = R + U
5. (i1, ..., iR) = disclosed_indexes
6. (j1, ..., jU) = range(1, L) \ disclosed_indexes
7. (msg_i1, ..., msg_iR) = disclosed_messages
8. (m^_j1, ...,m^_jU) = commitments
9. (Q_1, Q_2, MsgGenerators) = create_generators(L+2)
10. (H_1, ..., H_L) = MsgGenerators
11. (H_i1, ..., H_iR) = (MsgGenerators[i1], ..., MsgGenerators[iR]) 
12. (H_j1, ..., H_jU) = (MsgGenerators[j1], ..., MsgGenerators[jU])

Preconditions:
1. if length(disclosed_messages) != R, return INVALID
2. for i in (i1, ..., iR), if i < 1 or i > L, return INVALID 

Procedure:
1. dom_array = (PK, L, Q_1, Q_2, H_1, ..., H_L, ciphersuite_id, header) 
2. dom_for_hash = encode_for_hash(dom_array)
3. if dom_for_hash is INVALID, return INVALID
4. domain = hash_to_scalar(dom_for_hash, 1)
5. C1 = (Abar - D) * c + A' * e^ + Q_1 * r2^
6. T = P1 + Q_2 * domain + H_i1 * msg_i1 + ... + H_iR * msg_iR
7. C2 = T * c - D * r3^ + Q_1 * s^ + H_j1 * m^_j1 + ... + H_jU * m^_jU 
8. cv_array = (A', Abar, D, C1, C2, R, i1, ..., iR, msg_i1, ..., msg_iR, domain, ph) 
9. cv_for_hash = encode_for_hash(cv_array)
10. if cv_for_hash is INVALID, return INVALID 16. cv = hash_to_scalar(cv_for_hash, 1)
11. if c != cv, return INVALID
12. if A' == Identity_G1, return INVALID
13. if e(A', W) * e(Abar, -P2) != Identity_GT, return INVALID 
14. return VALID

tplooker commented 1 year ago

Discussed on WG call 21st of November, the proposal seems like an improvement to the specification. I'm going to mark this issue as ready for PR.

BasileiosKal commented 1 year ago

Taking the time to look into this proposal a bit more closely, i do like it. I'm not sure how the current approach has an issue problem. If an error is returned (like an IndexOutOfBoundException) it's not a bad thing necessarily, since in this case the proof is malformed.

That said i do like removing L from the input of ProofVerify as well as separating the deserialization in their own steps.

Just a minor editorial detail, it seems to me that the Precomputations steps loose their purpose a bit (operations to be cached). Maybe we can move more staff to Deserialization. Maybe something like:

result = ProofVerify(PK, proof, header, ph, disclosed_messages, disclosed_indexes)

Outputs:
- result, either VALID or INVALID. 

Deserialization:
1.  proof_result = octets_to_proof(proof)
2.  (A', Abar, D, c, e^, r2^, r3^, s^, commitments) = proof_result 
3.  if proof_result is INVALID, return INVALID
4.  W = octets_to_pubkey(PK)
5.  if W is INVALID, return INVALID
6.  R = length(disclosed_indexes)
7.  U = length(commitments)
8.  L = R + U
9.  (i1, ..., iR) = disclosed_indexes
10. (j1, ..., jU) = range(1, L) \ disclosed_indexes
11. (m^_j1, ...,m^_jU) = commitments

Precomputations:
1. (Q_1, Q_2, MsgGenerators) = create_generators(L+2)
2. (H_1, ..., H_L) = MsgGenerators
3. (H_i1, ..., H_iR) = (MsgGenerators[i1], ..., MsgGenerators[iR]) 
4. (H_j1, ..., H_jU) = (MsgGenerators[j1], ..., MsgGenerators[jU])

Preconditions:
1. if length(disclosed_messages) != R, return INVALID
2. for i in (i1, ..., iR), if i < 1 or i > L, return INVALID 

Procedure:
1.  dom_array = (PK, L, Q_1, Q_2, H_1, ..., H_L, ciphersuite_id, header) 
2.  dom_for_hash = encode_for_hash(dom_array)
3.  if dom_for_hash is INVALID, return INVALID
4.  domain = hash_to_scalar(dom_for_hash, 1)
5.  C1 = (Abar - D) * c + A' * e^ + Q_1 * r2^
6.  T = P1 + Q_2 * domain + H_i1 * msg_i1 + ... + H_iR * msg_iR
7.  C2 = T * c - D * r3^ + Q_1 * s^ + H_j1 * m^_j1 + ... + H_jU * m^_jU 
8.  cv_array = (A', Abar, D, C1, C2, R, i1, ..., iR, msg_i1, ..., msg_iR, domain, ph) 
9.  cv_for_hash = encode_for_hash(cv_array)
10. if cv_for_hash is INVALID, return INVALID 16. cv = hash_to_scalar(cv_for_hash, 1)
11. if c != cv, return INVALID
12. if A' == Identity_G1, return INVALID
20. if e(A', W) * e(Abar, -P2) != Identity_GT, return INVALID 
21. return VALID

That way is easier to describe Precomputations as operations that can be cached??

When it comes to security, since L is defined by the Prover, i don't see removing it compromising anything. In other words, if there is an attack that works by manipulating U and R, then an adversary can manipulate L in the current design and achieve the same thing. Just to be safe though, we may want to add U in the challenge hash calculation, so the proof value will be committed to it.

decentralized-identity / bbs-signature

Problem in ProofVerify #227