cosmos / cosmos-sdk

:chains: A Framework for Building High Value Public Blockchains :sparkles:
https://cosmos.network/
Apache License 2.0
6.18k stars 3.57k forks source link

Protobuf Transaction Signing #6078

Closed aaronc closed 4 years ago

aaronc commented 4 years ago

Problem Definition

The Cosmos SDK has historically used Amino JSON for signing of transactions whereas Amino binary is used for encoding. During the SDK's migration to protobuf, we had made the preliminary decision to use a canonical protobuf JSON encoding for signing as described in https://github.com/regen-network/canonical-proto3.

As a consequence of #6030, the Cosmos SDK is moving in the direction of using protobuf's Any type for the transaction encoding and signing format. In this discussion, a number of participants have asked that we revisit the transaction signing discussion. The options that have been discussed/are available for consideration are outlined below.

It should be noted that it is theoretically possible to support more than one of the following options via an enum flag on the signature. Whether that should or should not be done is a related question.

Proposals

Feel free to suggest updates to these alternatives in the comments

(1) Protobuf canonical JSON

The official proto3 spec defines a canonical mapping to JSON. This is not really deterministic, however, so we define a canonical encoding on top of that using https://gibson042.github.io/canonicaljson-spec/.

Pros:

Cons:

(2) Protobuf canonical binary

This involves re-encoding the protobuf used for transaction encoding canonically for signing - meaning that fields must be ordered and defaults omitted. This is how Weave does signing.

Pros:

Cons:

(3) Protobuf binary as encoded in transaction

This simply uses the protobuf encoding as broadcast in the transaction. This becomes a little easier for both signature creation and verification because of Any (although it could be done without Any too). Because Any wraps the raw bytes of the sdk.Msg, it is pretty easy to use these same exact bytes for signing and verification and only require that SignDoc itself is encoded canonically, rather than every Msg 

Pros:

Cons:

(4) Amino JSON

This is how tx's are signed currently. The reason this is under consideration is because breaking Amino JSON signing would break many clients especially the ledger app. Transactions could still be encoded with protobuf and the /tx/encode endpoint could accept Amino JSON and return protobuf rather than amino binary for tx broadcasting - some upfront work would be required to enable this but it is possible

Pros:

Cons:

(5) Custom Proto JSON

Extend (1) to support custom encoding of certain types like bech32 addresses

Pros:

Cons:


Related Question: Should we allow multiple signing algorithms?

Theoretically we can allow clients to use multiple signing algorithms and indicate which one they used as an enum flag on the Signature struct.

Pros:

Cons:

For Admin Use

ethanfrey commented 4 years ago

Thank you @zmanian I was not aware of 2. Only 1 and 3. That does make sense now to maintain backwards compatibility in signing

iramiller commented 4 years ago

I've seen implementations in Java, Rust, Go, C++ and C of the signing implementation.

Although the signing of raw Proto messages will be a huge simplification for these non-golang environments so when they do make changes to upgrade it will be a breath of fresh air.

In the mean time the signature type option that supports a transition period provides the flexibility for these environments to plan a migration path that works with their environment. Sure the transitional period supporting two parallel efforts is quite a bit more work, but this is the gift of compatibility being offered to those that invested early. This seems very generous for a project that hasn't even reached a 0.5 release yet. No one building on a platform with less than 1.0/LTS should expect the environment to be stable and each should have an investment plan to put in the work to keep up with the rapid pace of updates.

clevinson commented 4 years ago

Thanks for all the feedback here.

This issue was discussed in more detail with @zmanian, @aaronc, @webmaster128 (among others) on our Cosmos SDK grooming call today.

We decided to go with @aaronc's enum proposal as described above:

message Signature {
    PubKey pub_key = 1;
    bytes signature = 2;
    SignMode mode = 3;
}

enum {
    BASIC = 0;
    LEGACY = 1;
    EXTENDED = 2;
}

Followup actions are for @aaronc to update ADR 20 to incorporate this proposal, and separately to work with @zmanian on a proposal for the human-readable translation to/from raw-bytes that will live in Interchain Standards.

aaronc commented 4 years ago

My PR to update ADR 020 to reflect the signing format/Any as discussed in this thread is R4R: https://github.com/cosmos/cosmos-sdk/pull/6111

Would appreciate a review from anyone involved in this discussion.

There are a few things that didn't come up in the discussion but which I thought were good to include:

aaronc commented 4 years ago

One thing I want to note about protobuf canonicalization that I forgot when writing up these pros and cons is that it causes transaction verifiers to reject messages with unknown fields (as they won't show up in the canonical version). I believe that's generally the correct behavior for a blockchain - if I set a field which the server doesn't understand, the server should report an error rather than silently ignore.

So this is one benefit of (1) and (4), that (3) (raw bytes signing) doesn't have. Although I should note that maybe canonicalization isn't the best approach anyway.

One particular case that comes up in #6111, is that I'm advocating that the core Tx type proactively add tx options even if they aren't supported on all servers. For instance timeout_height (#6089) gives cIients the expectation a tx will timeout at some height, but if it isn't supported a server should actually reject the tx even if it does parse the protobuf. I'm also suggesting a generic extension_options field which could have anything.

So I'm just noting that we likely need some generic mechanism to reject unhandled fields/options. This is probably true regardless of which signing method we use even though canonicalization (even with Amino JSON) was previously one approach for doing this.

Two possible way to approach this that could work together are:

An alternative is to reconsider one of the other options (besides 3) as default. But this places added canonicalization complexity on clients (whereas checking messages should be the verifier's responsibility) and also opens up the malleability issue again.

tarcieri commented 4 years ago

One thing I want to note about protobuf canonicalization that I forgot when writing up these pros and cons is that it causes transaction verifiers to reject messages with unknown fields (as they won't show up in the canonical version).

Supporting unknown fields in this sort of scheme is the raison d'etre of Veriform, as it were. However, I think you can support them as well in your scheme if you have a schema-independent message parser/verifier which ensures messages are in canonical form.

Edit: Just remembered this isn't possible because Protobufs don't have a wire type for nested messages. Alas.

Edit again: Kenton Varda had some suggestions on that same thread regarding how to do Protobuf canonicalization for these purposes, however they feel a bit heuristic IMO.

I believe that's generally the correct behavior for a blockchain - if I set a field which the server doesn't understand, the server should report an error rather than silently ignore.

X.509 (and Veriform) both solve this problem via the inclusion of a "critical" flag on message fields (or extensions, in X.509 parlance). It's possible to add non-critical fields in a backwards compatible way, but unknown fields/extensions flagged as critical are rejected.

aaronc commented 4 years ago

X.509 (and Veriform) both solve this problem via the inclusion of a "critical" flag on message fields (or extensions, in X.509 parlance). It's possible to add non-critical fields in a backwards compatible way, but unknown fields/extensions flagged as critical are rejected.

In this scenario, though, there would be no way for a verifier to know if an unknown field were critical in a later version of the protocol (that the client supports but server doesn't). Critical flags would only help if the server possessed the newer schema but couldn't process it.

tarcieri commented 4 years ago

Yeah, unfortunately a critical flag is difficult to implement in Protobufs. Veriform encodes it directly into the wire type.

The only things I can think of are some sort of Any wrapper on extension fields which also adds a "critical" boolean flag, or a preprocessor/postprocessor which encodes it into the wire type, both of which are ugly solutions.

iramiller commented 4 years ago

I believe this extension/critical/processor thing goes back to the fundamentals... does the signature process need to concern itself with the contents of every message, or should that be the responsibility of the message handler?

I believe that the signature handler should not concern itself with the contents of the messages it is checking the signature of-- no more than a mailing envelope cares about any papers it may contain. The envelope is satisfied with a correctly formatted address and valid postage. So should the signature verifier be satisfied with a proper signature and leave the message validation to downstream code.

tarcieri commented 4 years ago

For verifiers of X.509 signed objects, critical extensions ended up being pretty important.

Without "critical" extensions, it's not possible to safely refine the protocol to add features which provide constraints/caveats on the authority the signed object designates in a backwards-compatible manner.

The alternative to this approach discussed earlier is to fail closed on any unknown field, but that precludes any sort of backwards-compatible schema evolution/extensions, which I think wind up being pretty important.

aaronc commented 4 years ago

I believe this extension/critical/processor thing goes back to the fundamentals... does the signature process need to concern itself with the contents of every message, or should that be the responsibility of the message handler?

I believe that the signature handler should not concern itself with the contents of the messages it is checking the signature of-- no more than a mailing envelope cares about any papers it may contain. The envelope is satisfied with a correctly formatted address and valid postage. So should the signature verifier be satisfied with a proper signature and leave the message validation to downstream code.

I agree that separating this from signature processing makes sense, but as @tarcieri is pointing out it does seem to be tightly coupled with encoding choices.

aaronc commented 4 years ago

The alternative to this approach discussed earlier is to fail closed on any unknown field, but that precludes any sort of backwards-compatible schema evolution/extensions, which I think wind up being pretty important.

That's a good point and I can't really think of another backwards compatible alternative besides trying to overload Any with a flag or something similar.

I think for unknown fields we either need to make the choice that a) critical extensions are never added as regular fields or b) unknown fields always trigger failure.

One thing I want to reiterate is that this applies in two places: the top-level tx and the sdk.Msgs themselves. I could imagine some sort of overloaded critical Any thing working for the top-level tx, but for sdk.Msgs that seems pretty complex. So I think without having something built into the encoding standard, rejecting unknown fields may be the necessary compromise...

iramiller commented 4 years ago

For verifiers of X.509 signed objects, critical extensions ended up being pretty important.

Without "critical" extensions, it's not possible to safely refine the protocol to add features which provide constraints/caveats on the authority the signed object designates in a backwards-compatible manner.

The alternative to this approach discussed earlier is to fail closed on any unknown field, but that precludes any sort of backwards-compatible schema evolution/extensions, which I think wind up being pretty important.

To clarify ... it isn’t that I feel these techniques are invalid or unimportant ... My point is that the signature handler cannot know of every field for every message type nor should it attempt to understand how to parse the message(s) with only partial understanding. To compromise on this point breaks the separations of concerns and yields a strictly less extensible, more complex, and difficult to extend approach.

The assigned message handler for the type on the other hand absolutely should validate the message in depth and throw an exception if there is something it does not understand or cannot parse.

aaronc commented 4 years ago

The assigned message handler for the type on the other hand absolutely should validate the message in depth and throw an exception if there is something it does not understand or cannot parse.

But the issue with protobuf is that once a message is parsed you just have a go struct and we don't currently have a mechanism for retrieving unknown fields. They're just discarded. So this rejecting messages with unknown fields needs to happen higher up in the stack at parse time.

aaronc commented 4 years ago

Actually @tarcieri, one way we could flag critical fields in protobuf is just with field number ranges. Say 1-127 is critical (what people would choose by default), 128+ is non-critical.

aaronc commented 4 years ago

Also if any of you get a chance to do a quick review of #6111 before we finalize it that would be greatly appreciated @tarcieri @iramiller

tarcieri commented 4 years ago

I think for unknown fields we either need to make the choice that a) critical extensions are never added as regular fields or b) unknown fields always trigger failure.

I'd suggest taking a look at Adam Langley's blog post about extensibility. I think it winds up being pretty important:

https://www.imperialviolet.org/2016/05/16/agility.html

Actually @tarcieri, one way we could flag critical fields in protobuf is just with field number ranges. Say 1-127 is critical (what people would choose by default), 128+ is non-critical.

That sounds great! Although to really make it work, you need a Protobuf implementation that allows you to extract the unknown fields so you can check if any are critical.

aaronc commented 4 years ago

I think for unknown fields we either need to make the choice that a) critical extensions are never added as regular fields or b) unknown fields always trigger failure.

I'd suggest taking a look at Adam Langley's blog post about extensibility. I think it winds up being pretty important:

https://www.imperialviolet.org/2016/05/16/agility.html

Thanks for sharing. So I think the big takeaways are make the extension system simple - in protobuf this is basically adding new fields - and allow newer clients to work with older servers - thus the desire to not just reject all messages which clients send with new fields but only ones that are flagged as critical.

Actually @tarcieri, one way we could flag critical fields in protobuf is just with field number ranges. Say 1-127 is critical (what people would choose by default), 128+ is non-critical.

That sounds great! Although to really make it work, you need a Protobuf implementation that allows you to extract the unknown fields so you can check if any are critical.

Yeah, we might need to do something hand-generated that takes the message descriptors and flags unknown critical fields.

tarcieri commented 4 years ago

I'm not terribly familiar with the Go Protobuf ecosystem, but I believe you can use goproto_unrecognized for this purpose?

aaronc commented 4 years ago

I'm not terribly familiar with the Go Protobuf ecosystem, but I believe you can use goproto_unrecognized for this purpose?

Yeah unfortunately it would break other stuff to enable that so it would likely need to be a separate parser pass.

clevinson commented 4 years ago

We'll be discussing the topics related to this, and more specifically any outstanding blocking issues with ADR020 tomorrow on our bi-weekly architecture review call.

https://hackmd.io/UcQfA-JRQjG1zfKKWxH1gw

Zoom link in the HackMD if anyone from this thread would like to join.

webmaster128 commented 4 years ago

does the signature process need to concern itself with the contents of every message, or should that be the responsibility of the message handler?

Looking at the protobuf document, this is no issue: every message is stored in an Any field, and Anys are embedded as raw data into the containing document. So the signature handler only needs to understand the outer envelope (1-2 small layers). The rest is opaque data for the verifier, that can be passed to the type-specific message handler.

But the issue with protobuf is that once a message is parsed you just have a go struct and we don't currently have a mechanism for retrieving unknown fields. They're just discarded. So this rejecting messages with unknown fields needs to happen higher up in the stack at parse time.

You need to parsing steps anyways (either explicitely or hidden by gogoproto): (1) the outer document including the Any field and (2) the message.

So the extra fields we are talking about is when a client has a different message schema for the same type URL with extra fields? Well, then I guess the chain is the source of trueth and the client can add whatever data they want to a given message. But this is a matter of message type upgradability, not signature verification.

aaronc commented 4 years ago

Yes, this is about message upgradeability, not signature verification per se. Although the canonicalization approach to signature verification prevents upgradeability (in the direction of newer clients interacting with older servers).

clevinson commented 4 years ago

So summary from our architecture review call today (https://hackmd.io/UcQfA-JRQjG1zfKKWxH1gw?view), which focused on ADR020 (#6111)

We had consensus on the approach laid out in this PR, with a few minor additions:

Oustanding question:

EDIT: It is expected that the resolution of this PublicKey question is not a blocker for ADR020, but will be discussed & resolved seperately.

@aaronc Can you lay out more details for this outstanding point here?

aaronc commented 4 years ago

There is a pretty big discussion that has been unfolding related to #6111 and #6174 in those PRs and now on discord. I think it has gotten to the point where it deserves its own list of alternatives with pros and cons. I will make my own attempt below.

The general context of this discussion is the UX of a multi-signer transaction. I will frame the various alternatives under three broad umbrellas - (1) Closed signers, (2) Open signers, (3) Open then closed signers.

(1) Closed signers (the current Cosmos SDK model)

Definition: the list of signers is fixed by the content of the messages. Extra signers are not allowed. Every signer must sign over the list of other signers

Multi-Signer UX:

  1. Tx composer proposes the transactions to possible signers and asks who will sign
  2. Tx composer composes messages with the list of signers that agreed to sign
  3. All signers that agreed to sign must sign
  4. Tx can be broadcast once the final signer signs

Pros:

Cons:

Variation (a): Don't ask for pubkey and signing mode in step 1

This is effectively the current SDK model which doesn't include signing modes

Compared to variation (b):

Variation (b): Ask for pubkey and signing mode in step 2

This is model proposed in #6111

Compared to variation (a):

(2) Open signers (the Weave model)

Definition: extra signers not specified by messages are allowed

This is Weave's model. (@webmaster128 if I get something wrong here please correct me)

Multi-Signer UX:

  1. Tx composer composes messages and shares with list of potential signers
  2. Any signers can sign and broadcast immediately, as long as there are enough required signers the tx will succeed

Pros:

Variation (a): Original Weave model - no signature limit

As @webmaster128 shared, this had the following cons:

Before we discovered and fixed a bunch of attacks late 2019 you could do the following:

  1. Spam a ridiculous amount of unnecessary data as part of a signatures (since protobuf allows unnecessary fields)
  2. Add an arbitrary amount of unneeded signatures for no reason
  3. Change signers[0] which was used as a default value in some places, since signatures are unordered. This allowed someone else to turn any signer into the fee payer, among other things.

Variation (b): Fixed Weave model

According to @webmaster128:

  1. was a combination of carelessness and a type system that has no set type. The signers were originally designed as a set but then interpreted as an ordered list in some situations. This was solved by never using the order of signatures in the application.

A Cosmos SDK-style tx size limit did not exist. 1. and 2. were stopped by implementing a tx fee based on byte size. Now you can spam as much as you want, if you pay for it. Fees are calculated based on the expected number of signatures.

No gas system exists in Weave. But the tx size fee automatically puts a limit on the number of signatures. If someone adds too many signatures, the tx is rejected since too big. Otherwise it will pass.

I believe this is the actual issue: https://github.com/iov-one/weave/issues/1091

While this solves some issues, it still appears to be exploitable (??), i.e. if the message was intercepted an extra signer could still add themselves and screw things up for the fee payer, is that true??

(3) Open then closed signers

This is the approach proposed in #6174

Multi-Signer UX:

  1. Tx composer composes messages without fixing the list of required signers and shares with a group of potential signers
  2. Any signers can sign whether or not they are required/expected
  3. As soon as there are enough signers, any signer can become the fee payer and sign and broadcast the final message that fixes the list of signers and the fee

Pros:


Apologies for this being such a long post. I really want to get to the bottom of this and agree on an approach and move forward.

In my personal opinion, approach (3) seems to have the most benefits and least tradeoffs for multi-signer transactions. It allows the benefits of (2) without the downsides.

Any variation of (1) would be the status quo and acceptable. I don't see a huge difference between (1)(a) and (1)(b) - it's still the same number of phases and requires fixing signers before signing. I would prefer the guarantees of (b) but at this point whatever. Neither seems to actually give a very good multi-signer UX which I thought was the point of this.

That's why I recommend approach (3) #6174 - good security guarantees + good multi-signer UX.

zmanian commented 4 years ago

I feel like 3 is the correct direction as well.

webmaster128 commented 4 years ago

Thank you @aaronc for the great summary. Here are a few additions/remarks and then separated my person vote.

(1) Additional Pros:

Regarding Con:

Limiting the list of signers means that the tx composer first needs to ask who will sign (phase 1) and then ask that same group of people to sign (phase 2)

The point does not apply to a multisig-pubkey, which is a single signature that can be satisfied by any signatures of the group that combined make up a valid multi-pubkey signature. For the required signers of the individual messages, it the required signers are deterministic already.

(1a) Regarding Pro:

Tx composer doesn't need to ask for pubkeys and signing mode in step 1, just account addresses

The tx composer does not even need to ask for the addresses. They exist automatically as part of the messages.

(2) Comment: Right, using the Weave model in Cosmos SDK would be exploidable since an external party can add unnecessary signatures causing the gas consumption to exceed the limit. If the manipulated tx is processed before the original tx, the tx fails but the gas cost is spent. In Weave this issue does not apply since there is no gas system.


There is a significat difference between (1a) and (1b): The list of required signer addresses is available for free locally in the messages. Needing to know the public key for each required signer address can only be automated if all pubkeys are stored on-chain, which is not given for new accounts. The communication overhead of needing to know the signing modes of each signer is relevant as it increases the roundtrips of human communication. Additionally it violates separation of concerns.

My vote goes to (1a) as it

ps.: there is a variant of (3), let's call it (4) which only allows a single top level signature and keeps all other signatures at a lower level. It comes with the same two-phase signing process as (3) but simplifies everything. Most transactions only need a single top-level signature (which can be one of a multi-pubkey groups) and the few transactions multi multiple inner signatures can be processed in two steps. It is a radical change. Let me know if someone is interested to learn more.

aaronc commented 4 years ago

is most convenient for humans since no dependencies between signers exist

But that's not true because they still need to agree upon the list of addresses to include in the messages. If you wanted to coordinate 3 signers out of 5, you would still need to ask all 5, then get an ACK from 3 of them and then compose the messages and then have them sign it. I acknowledge that the weave model made this simpler and I think (3) gets pretty close.

aaronc commented 4 years ago

Based on discussions on discord, I have updated my original PR #6111 with the following:

I agree with @alexanderbez that we should try to keep our initial implementation as simple and as secure as possible. Even if we did come to consensus that (3) provides a better UX, it is beyond the scope of our current work to address this. But I think it's good to document the alternatives and then see what users ask for in practice. (1)(b) seems to be the simplest to implement with the most security guarantees and wouldn't preclude (3) and/or (1)(a) from existing as alternate signing modes in the future.

alexanderbez commented 4 years ago

What is left to get the work started? What needs to be merged?

aaronc commented 4 years ago

What is left to get the work started? What needs to be merged?

6111, then maybe an init PR with the proto file and then implementing the signing logic.

webmaster128 commented 4 years ago

is most convenient for humans since no dependencies between signers exist

But that's not true because they still need to agree upon the list of addresses to include in the messages. If you wanted to coordinate 3 signers out of 5, you would still need to ask all 5, then get an ACK from 3 of them and then compose the messages and then have them sign it.

There are two types of multiple signatures currently discussed:

  1. The list of requires messages signers that is calculated from the GetSigners result of each message.
  2. A multisig account type that has a separate group account and thus a separate address. It has a special type of pubkey that encodes the participants and threshold.

The later results in a single (nested) top-level signature, i.e. is one signer no matter which group participants signed. So no coordination required on which participants will sign.

The first one is not suited to implement n/m multisig at all because the nature of the GetSigners() method makes flexible signers impossible. It is just different people who authorize different things. This is needed to do e.g. atomic trades where Alice signs a token send message that sends tokens to Bob, Bob signs a smart contract execution that transfers asset X to Carl and Carl signs a 3 message sending tokens to Alice, Bob and a charity. This gives us 5 messages signed by 3 pre-specified entities. The state of the current SDK is that for each of those parties you only need to know the addresses, which are part of the messages.

clevinson commented 4 years ago

Closing this now, as this issue was targeting an update of ADR020.

6213 will be used for tracking the actual implementation of the updated ADR as described in #6111