Support Bounded-Memory Implementations with a single Message Pass

decentralized-identity / bbs-signature

The BBS Signature Scheme

https://identity.foundation/bbs-signature/draft-irtf-cfrg-bbs-signatures.html

Apache License 2.0

80 stars 26 forks source link

Support Bounded-Memory Implementations with a single Message Pass #313

Closed bellebaum closed 5 months ago

bellebaum commented 9 months ago

Hi all,

I am trying to implement the IETF-draft in C using no dynamically allocated memory, thus suitable for e.g. embedded devices with no heap implementation.

There are only a few places in the specification which are troublesome to such an endeavour and I wanted to share these with you, and suggest improvements, should you wish to explicitly support this use case.

For efficiency, there are two critical points to consider:

Ideally, it should be possible to only go over every message once per high level API call (Sign, Verify, ProofGen, ProofVerify). Among other things, this requires being able to generate message scalars individually, given a message's index and optionally the total number of messages (which the current draft supports).

Likewise, it would be helpful if one could avoid generating the generators more than once during such API calls.

Let us briefly go over every high-level API function dealing with messages to see how this would work. Below algorithm sketches rely on the ability to incrementally hash arbitrary octet strings, but this is typical with many hashing APIs.

Sign

Sign can be implemented with a single message pass. While iterating over the messages, the algorithm would use B as an accumulator, and incrementally prepare the input to hash into e (as part of hash_to_scalar).

Sign does not allow to only generate generators once when doing this: Before adding any messages to be hashed into e, the domain must be added, for which all generators must already be generated. Then while accumulating B, the generators need to be regenerated, as there is no way to store an arbitrary amount of generators in bounded memory.

A simple fix would be to reorder the inputs to the generation of e:

2. e = hash_to_scalar(serialize((SK, msg_1, ..., msg_L, domain)), signature_dst)

This should not have an effect on security, as e is essentially only supposed to be "random" but unique for each message. But please double-check :)

Verify

Verify can be implemented efficiently due to commutativity of field elements. Essentially, one iterates over the messages, generates generators on the fly, accumulates "generator message_scalar" into B while adding the generators to the domain-Hash, and finally adds `Q_1 domaintoB`.

ProofGen

Other than some ideas from above, this function runs into two issues, one of which is easily fixed.

The m_tilde values need to be generated before the the challenge phase (accumulated into T2 and hashed into the challenge) and after it (to generate m_hats, which are part of the signature). Fortunately, the values are essentially pseudorandom, and can deterministically be regenerated from a PRG at little to no cost. An algorithm would store the undisclosed message scalars temporarily into the proof, and after the challenge phase multiply it with the challenge and add m_tilde.
The challenge calculation hashes first Bbar and friends, then the messages. Because the messages are required for the calculation of B, an implementation has to iterate over them multiple times. The fix involves a reordering of the inputs to the random oracle, which again should not have an impact on security. The ideal ordering would look like this:

1. c_arr = (R, i1, msg_i1, ..., iR, msg_iR, Abar, Bbar, D, T1, T2, domain)

ProofVerify

Besides the problems already mentioned, this function seems fine. In particular, the modified challenge calculation should work here as well.

TL;DR

By reording a few inputs to hash functions in two places, low-resource implementations can become potentially a lot more efficient. Thanks for reading :)

bellebaum commented 9 months ago

Adding some references for further reading to our implementation:

Sign

We would like to hash the generators into the domain here But since at that point we also need to hash the message scalars into e here, we have to create the generators twice here and here.

Verify

Within one loop, we calculate the message scalars and generators exactly once here.

ProofGen

For any random scalars, we use hash_to_scalar as a PRF and derive scalars for undisclosed messages here and here. We still need to iterate over the disclosed messages for the challenge twice here and here for the challenge.

andrewwhitehead commented 8 months ago

Hello, sorry about the lack of feedback here but we did discuss this on the weekly meeting a few weeks ago. Because we have requested a cryptographic review of the draft we are waiting to see what issues might come up there before making any algorithm changes, but we are definitely supportive of such an update.

bellebaum commented 8 months ago

Hello Andrew, that is great to hear. I have also proposed an alternative on the CFRG mailing list, which would be even more efficient, because it need not hash the messages at all for both e and the challenge. That optimization however is something which definitely requires cryptographic review, so one might want to consider pitching the option to the people reviewing the current draft.

andrewwhitehead commented 8 months ago

The revealed messages were actually added to the presentation challenge hash to avoid this issue: https://github.com/decentralized-identity/bbs-signature/issues/74

The draft has changed since then, but I believe the message & index hashing is still necessary in this case. Whether the message hashing is necessary is in the generation of e would be a separate issue.

bellebaum commented 8 months ago

Let me unravel the value of T2 in the current draft. Inlining a bunch of calculations in ProofVerifyInit, I get that

T2 = P1 * s0 + Q_1 * s1 + H_1 * s2 + H_2 * s3 + ... + H_L * s{L+1}

You might go on to detail that s0=c, s1=domain*c, and the other scalars are either prover-chosen "random" values or c*msg_i values, but importantly for us: They are scalars.

The attacks in #74 (and many other attack vectors) crucially rely on not changing the challenge c, which (up to collision resistance of the challenge hash function) implies not changing the value T2. But this (again, up to some collision resistance) implies not changing the values for the si scalars. To see this at a glance (and to point at an actual proof), note that collisions would also break the signature scheme itself. Namely, since the scheme only signs B, a collision in the calculation of B (which is analogous to the calculation of T2) would directly lead to a forgery. This is assuming that the signer and verifier do not hash the messages to scalars, but that was also the point of this comment.

So if the scalars are fixed, where else could a malicious prover cheat without altering the challenge (which would cause him trouble in the proof finalization stage)? Since c is fixed, there is only exactly one value for a message scalar, which would be accepted, namely si * c^-1. So to convince a verifier to accept a challenge, the prover has exactly two options per message scalar si (i > 1) (besides breaking collision resistance): Publish a message hashing to si * c^-1 or claim that si is a random offset.

Two options to address this:

Option 1: Include the choice of which indexes are disclosed in the challenge. Note that we do not need to include the messages, just their indices.

Option 2: Rely on the collision resistance of the hash_to_scalar function.

The main argument for option 1 would be that there seem to be some people who do not want to rely on hashing and want to encode numbers as scalars a different way. Having too broad of a scope for a cryptographic algorithm usually leads to misuse, but I see the argument there to attempt to more easily allow for range proofs and alike. With that in mind, let us look at option 2:

A malicious prover having generated Nc challenges and expecting one of Nv values for a message scalar they would like to maliciously disclose will have a chance of (roughly) Nc * Nv / 2^256 of a useful collision allowing them to pull this off. Take all 64 bit integers as disclosable values and assume the attacker is dedicated to generating 2^80 challenges. Their chance of success is about 2^-112, which is reasonably small.

Based on this, I would recommend to go with option 2 if there are no documented use cases beyond "small" numbers and hashed messages. If there are, I would recommend option 1. Under no circumstances should it be necessary to hash the messages themselves into the challenge.

BasileiosKal commented 8 months ago

Hello and thank you for raising the issue!

Regarding the attack from #74, the point was to show that the unforgeability property does not hold. Indeed, if messages are mapped "only" through hash_to_scalar, the attack is not very useful. However this is not the only option and the draft allows for alternatives.

You already noted the use of range proofs. Consider also the case of pseudonyms. The prover will be able to use a different pseudonym by forging different prover identifiers with each proof generation. As another example, consider a credential id based revocation (where credeintial_id := random_scalar).

For option 1, I don't see how hashing the indexes avoids the described attack. We assume that the prover is adversarial, meaning that they can hash whatever indexes they want during proof generation.

For option 2, in general, AFAIK, security proofs require the entire transcript of the zk-proof to be included in the challenge hash. Even if we accept that all messages should be “hashed to scalars”, avoiding the above issue is not equivalent with proving that the scheme is secure. If you are aware of such a proof I would be more than happy to review it. Otherwise, even disregarding the mentioned use cases, the update seems a bit risky to me.

BasileiosKal commented 8 months ago

For the rest of the updates I'm just going to add a +1 to Andrew's comment ;P

In the calculation of e I think we can replace the messages with B. The reason we did not implement it up until now, was to avoid breaking changes. However, given that we are indent to introduce changes either way in other places, we can consider updating the calculation of e.

I will try to implement them and open a PR soon. Would appreciate your review on that. However, as Andrew mentioned, given that we are going under crypto-panel review at this point, it is not clear when those changes will be incorporated to the IETF document.

bellebaum commented 8 months ago

Hello Vasilis :)

You already noted the use of range proofs. Consider also the case of pseudonyms[^1]. The prover will be able to use a different pseudonym by forging different prover identifiers with each proof generation. As another example, consider a credential id based revocation (where credeintial_id := random_scalar).

[^1]: This link for some reason did not work for me, but I found the source here.

Right, these are very good points. But, working somewhat closely with your draft, let me highlight a possibly simpler solution for such cases: In your draft, you have modified the challenge calculation (in this section). Abstracting a bit (and ignoring variable ordering), what you have effectively done there is to build an AND-Proof with a Chaum-Pedersen proof of knowledge of pid_scalar applied to the (in this case final) generator, a per-verifier base point and your per-verifier pseudonym[^2]. In fact, since the point of your draft is to not disclose the pid_scalar (which would lead to cross-verifier linkability), your draft does not rely on hashing disclosed message scalars at all.

[^2]: By the way, I love how you reuse the random message scalars here to not increase the size of the proof beyond the addition of one message to the signature scheme. Very clever :)

What you have effectively done is to add the claimed pseudonym, its corresponding proof commitment (U in your ProofGen, Uv in your ProofVerify) and the verifier specific "base point" OP to a kind of "extended presentation header", which gets hashed into the challenge.

Note that this works in general for AND-Proof constructions: Add all commitments of your additional proof to the challenge. If you want to reuse the existing interfaces, treat them as part of an extended presentation header. Beyond the 2^64 numbers example I have given above, this should also extend to range proofs for arbitrary value ranges. This appeals to me because it streamlines security analysis of extensions to this draft.

For option 2, in general, AFAIK, security proofs require the entire transcript of the zk-proof to be included in the challenge hash. Even if we accept that all messages should be “hashed to scalars”, avoiding the above issue is not equivalent with proving that the scheme is secure. If you are aware of such a proof I would be more than happy to review it. Otherwise, even disregarding the mentioned use cases, the update seems a bit risky to me.

All formulations of the zk-proof system I have seen treat the disclosed messages as part of the statement, just like the generators, signer public key etc. and prove knowledge of a valid signature including these messages. The system itself then consists of three rounds: A commitment, a challenge and a response. The Fiat Shamir transform calculates the challenge as a hash of the commitment (and an optional presentation header, if your goal is to create a challenge). These constructions are well researched, and there exist proofs for them. However, this is not what we try to achieve here. The disclosed messages are not part of the commitment (which is the transcript of the zk-Proof part) and thus not hashed in all the formulations I have seen.

That said, while there are no additional security properties for the zk proof, there might be some additional benefits in the envisioned use cases, where an adversary might succeed in his goals by constructing a malicious proof for one of potentially many statements (because of many messages the attacker would be comfortable maliciously disclosing). Generally, without hashing the statement, there is a standard reduction claiming that an adversary's advantage will at most increase by a factor of N when trying to prove one of N false statements. This is why in my example above an adversary can maliciously disclose one of 2^64 messages with probability 2^-(256-80-64) and not 2^-(256-80). I argue that this is still not too much easier than breaking the signature scheme itself.

But let us assume that all assumptions fail and for some reason there is a kind of value for which the adversary would be happy to disclose almost any value. I claim that this only applies to very few kinds of messages. For these, there is a very simple solution: Hash exactly these disclosed messages, but none other.

bellebaum commented 8 months ago

Actually, here is a quick proof-like argument for why hashed scalar values do not need to be added to the challenge hash:

Pick any bound b on the number of invocations of hash_to_scalar the adversary is willing to perform, which for this exposition I will model as a random oracle. (That is, the total number of challlenge calculations Nc plus message scalar calculations Ns satisfies Nc + Ns = b.)

As a baseline, the adversary could use all their hash invocations to come up with a collision in the message scalar calculation, which would give them a simple way to forge a signature. By a simple birthday bound, their success probability is roughly b^2 / 2^256. If the best attack on ProofVerify was not much better than this, then this is the more pressing issue.

Now assume for a moment that the adversary is unable to find different two vectors of si values (see above) resulting in the same value of T2. From various proofs on the security of the signature scheme it should be clear that this has negligable probability of occuring and would in fact break a no-Hash version of the BBS signature scheme. Then for each invocation of the challenge hash_to_scalar, the adversary gets at most one value of si * c^-1, which would be accepted by the verifier as a valid message scalar, amounting to at most Nc such (random) values. But since all the messages get hashed, the adversary also holds at most Ns values of message scalars for which it knows a message. The chance of a collision is thus at most Ns * Nc / 2^256, which is worse than the above attack because Ns * Nc < (Ns + Nc)^2 (For a collision we have Nc > 0 and Ns > 0).

BasileiosKal commented 8 months ago

Those are valid points, but I think there is a disconnect. What I meant to say is not proof that the above attack is not possible (which you outlined clearly), when using hash_to_scalar but that there is no possibility for any "similar" attack. In other words, can we guarantee that the only way to "forge" a valid (disclosed or undisclosed) message is by revealing si * c ^ -1??

(note also that we allow for 2^64 messages to be signed. The adversaries goal can then be to find a suitable si * c^-1 for any 1 =< i < 2^64, further increasing their chances of success).

This is not to claim that there is not such proof. In fact it may very well exist, just by comparing all the values that go into the challenge hash (i.e.,D, T_2 etc) and showcasing that forged messages can ONLY be of the form si * c ^ -1 and hence, by using hash_to_scalar we avoid that problem.

Even if that is the case however, not sure how comfortable I am with using non published and peer reviewed security proofs (of course depending on their complexity).

The Fiat Shamir transform calculates the challenge as a hash of the commitment (and an optional presentation header, if your goal is to create a challenge).

I think I caused some confusion mentioning a transcript. To better explain, let's use the sigma protocol formulation you mentioned, with a common input Y, a commitment A, a challenge c and a response f. As you noted, that the disclosed messages will not be part of A, but of Y. In this document, when referring to the Fiat-Shamir heuristic, we are actually using the strong Fiat-Shamir transformation described in [BPW16]. Using that formulation, both Y and A must be inputted in the challenge hash to get a secure protocol. Note that we have to use the strong Fiat-Shamir transformation, given that Y is partially chosen by the (malicious) prover.

For these, there is a very simple solution: Hash exactly these disclosed messages, but none other.

This could work, however, IMO this solution increases complexity in a worrying amount, creating the danger for implementation errors. If we fully describe how to do this (which we tried before), we make the document more complicated and restrictive. If we don't, and simply add a note saying that "if you don't use hash_to_scalar, pass the messages to the challenge hash", we are running the danger of people ignoring it, or implementing it wrong.

That to say, that for the (i believe) very small efficiency improvement in most cases, I'm not sure if the added complexity (or IMO risk) is worth it. Is this something that would make a significant difference in your use case?? (note that we are willing to apply the suggested re-arrangements).

bellebaum commented 8 months ago

(note also that we allow for 2^64 messages to be signed. The adversaries goal can then be to find a suitable si * c^-1 for any 1 =< i < 2^64, further increasing their chances of success).

Right, I guess I missed that there is no domain separation in hash_to_scalar for different message indexes. Without it, this is indeed dangerous. Even with domain separation, a formal proof might be pose more challenges.

we are actually using the strong Fiat-Shamir transformation described in [BPW16].

Am I missing a proof of Theorem 1 somewhere? I would have liked to see how they tightly handle the case of an A choosing n statement-proof pairs more or less randomly, but updating its randomness after each pair by applying SHAKE to its old randomness and all received query responses. My intuition tells me that with the provided query options, K has to start over after extracting each witness, leading to a runtime in O(n^2). This particular case can probably be avoided using partial rewinds, but I cannot tell if there are other corner cases then.

I have been thinking about possible proofs for the hash_to_scalar case, but so far have not come up with anything, in part because I do not fully understand the tightness of the current proof. A proof should not be too complex to adapt I would think, but I am unsure. For now, I agree that hashing the disclosed messages is the safest way forward.

Is this something that would make a significant difference in your use case?? (note that we are willing to apply the suggested re-arrangements).

It really depends on where this is used. For credentials, I would guess that signing authorities can get quite busy (which is why the removal of messages in the calculation of e is a good thing), but proofs are created by many users with few generated proofs per person. The more pressing issue could be on the proof verifier's side when protecting against DDoS, but even then the serialization and hashing of the message scalars is probably nothing compared to hashing the messages in the first place (at least with SHA).

So no, the most important part is the re-arrangement, hence this issue :)

If we fully describe how to do this (which we tried before), we make the document more complicated and restrictive.

This is probably a separate issue, but I would really like to see an interface for additional proofs, so that extensions for other proof types have a single correct way to interface with this specification. For example the "extended presentation header" idea above could be adapted into a more standard interface or additional argument, so people do not have to redefine core operations for their extensions.

The reason I am bringing this up here is that such additional inputs to the challenge are not trivially handled by constant-memory implementations. The most efficient place to hash in additional values would usually be the spot in the challenge where the last affected message would be revealed (if we were revealing them).

For example, with the pseudonyms idea above, the triple (Pseudonym, OP, U) can most efficiently be calculated when processing the (in this case) final message scalar (=the pid_scalar) and its commitment (=pid~). This corresponds to something like this:

1. c_arr = (R, i1, msg_i1, ..., iR, msg_iR, Pseudonym, OP, U, Abar, Bbar, D, T1, T2, domain)

BasileiosKal commented 5 months ago

Hey all!

Opened PR #319 to address this issue.

Just a question (sorry if this is obvious). Was wondering if the proposed updates are to increase efficiency and ease of implementation, or are they "absolutely necessary"?

For example, for the calculation of e, could we create a temporal accumulator for the messages, lets say msg_acc, go through the messages once, accumulating each msg to msg_acc (and each generator to the domain digest input etc.,) and then calculate e as hash(SK || domain || msg_acc)??

Similar for the challenge hash during proof gen.

I can see that the proposed alternatives are simpler and more efficient, I'm just curious around the motivation (which will be needed when we present the update to the cfrg).

Am I missing a proof of Theorem 1 somewhere?

Sorry @bellebaum for the delayed reply. AFAIK there is not any generic formal proof for Theorem 1, just security arguments based on counterexamples.

bellebaum commented 5 months ago

Hey Vasilis,

as far as I can see, the schemes appear to be working without any modifications, since each message scalar is of fixed length and hash_to_scalar is sensitive to the message length. When accumulating message scalars, the accumulator should in any case be collision resistant, and so introducing another cryptographic hash or otherwise universal hash function (using an extra key; this seems to be the most efficient accumulator I can think of right now, I have not thought this through at all) for aggregation appears to have marginal benefit. That is, unless we use B as msg_acc, which we are calculating anyway, and which is already covered by security proofs. This would save a few hash instructions.

While we are at the calculation of e: There is this draft at the CFRG, which aims to introduce some additional randomness into the calculation of their e-equivalent to e.g. provide some heuristic fault-tolerance in embedded devices. I don't know of the performance implications, but this might be reasonable to adopt as a heuristic here as well.

Thanks for opening the pull request and working on this :)