cosmos / cosmos-sdk

:chains: A Framework for Building High Value Public Blockchains :sparkles:
https://cosmos.network/
Apache License 2.0
6.23k stars 3.6k forks source link

Questions/clarifications about ADR-20 #6710

Closed ebuchman closed 11 months ago

ebuchman commented 4 years ago

First, I want to applaud the amazing work that went into ADR-20 and to thank everyone that participated in the long discussions in #6030 and #6078. Apologies for having not been more involved sooner. In trying to review all of that now, I find there is quite a lot discussed, and it is hard to get a clear summary of the various points. While they have obviously heavily informed the final design of the ADR, perhaps we can make some attempt to better summarize the discussion and considerations in the ADR itself, to aid future reviewers.

From my own review, I have some questions/concerns which I will just drop here. I suspect at least some of this is just me missing things, but maybe some of it will require more consideration. It may be easier to answer these on a call, and I'd be happy to help transcribe the answers into updates to the ADR.

alexanderbez commented 4 years ago

We should address every single one of these questions on a call @aaronc

aaronc commented 4 years ago

Thanks for sharing your questions @ebuchman ! Happy to get on a call to discuss. But l also think a documented summary would be helpful. Most of these questions are l believe addressed at some point in the threads. But they were pretty long threads and hard to sift through. I’ll try to find some time to do that write up and of course let’s schedule a time to chat.

aaronc commented 4 years ago

Why is TxRaw necessary? Can’t we just directly encode Tx?

SIGN_MODE_DIRECT does verification based on encoded binary bytes. Because protobuf serialization is not guaranteed to be deterministic we cannot guarantee that implementations always serialize messages the same way. The raw bytes fields in SignDoc and TxRaw facilitate ensuring that the same bytes that are signed are the same bytes that are broadcast. This issue was raised many times and I even acquiesced to having just a Tx type at one point leaving it to clients to deal with the determinism of their own encoding. Then one of the client developers working with the new signing format realized that it would be better to just separate Tx and TxRaw and introduced this PR to revert things back to have both types: https://github.com/cosmos/cosmos-sdk/pull/6359.

Why are Body and AuthInfo separated? Can’t we just include the AuthInfo stuff in Body?

The content of AuthInfo that is signed over is a bit defensive, maybe even overly defensive. For instance the sign modes of all the signers and the BitArray for multi-signatures gets signed over. This maybe isn’t really necessary, but because concern was raised over multiple signing modes and security is the first concern, the default signing mode (SIGN_MODE_DIRECT) is extra defensive and signs over this content so that there is basically zero surface area for malleability.

However, it is acknowledged that maybe this is too defensive for some users doing multi-signatures and that it may be desirable to allow some malleability to make signature aggregation easier. Thus AuthInfo is separated out of TxBody to allow some flexibility for future users who may advocate for a relaxed signing mode (ex. SIGN_MODE_AUX) that allows a bit of malleability in exchange for improved UX.

Another way of putting this is that TxBody is the content that all signers always need to sign over regardless of the mode. AuthInfo is the stuff that signers may or may not all need to sign. For instance, Fee only really needs to be signed by the first signer. It is not supported currently, but a future sign mode such as SIGN_MODE_AUX may allow secondary signers to not sign Fee (because they don’t pay it) but they would always all to sign TxBody.

I’m concerned about the multiple sign modes. Is the idea that a single chain would support multiple modes simultaneously? What about a single tx? Currently it appears the answer to both is yes, though I’m sure most folks would prefer the answer being no, though I understand the potential need to support say 2 modes on a chain from a upgrade-path perspective … but at least it seems crazy for a single tx to potentially support multiple modes, so maybe the mode should be part of AuthInfo ?

Yes, the SDK currently supports both SIGN_MODE_LEGACY_AMINO_JSON and SIGN_MODE_DIRECT by default. Other modes may be added in the future. Concern over multiple modes has come up many types, but when examined the only substantive security arguments I have seen against it is when the mode can be manipulated by the attacker (see https://github.com/cosmos/cosmos-sdk/issues/6078#issuecomment-619610788). However, this particular type of exploit would require some pretty bad design choices which we’re trying to avoid, and even the impact of bad design choices can be mitigated by an overly defensive default sign mode like SIGN_MODE_DIRECT.

In the discussions, it seemed like in spite of the initial resistance to multiple signing modes, examination of use cases revealed that multiple signing modes for different security and legacy compatibility reasons is actually beneficial to clients who generally only need to implement one mode that works for them. For instance, without supporting legacy amino JSON, we will break wallets and exchanges plus the ledger app overnight. The alternatives are either not having a protobuf signing mode at all or allowing multiple sign modes. We chose the latter and erred on the side of caution with how it’s supported. Also it should be noted that most of the future signing support (such as SIGN_MODE_TEXTUAL) extends on the security guarantees of SIGN_MODE_DIRECT but adds additional security guarantees for users of hardware wallets.

Sign modes are inherently scopes to the signer not the transaction. I’m not sure why we would want to scope it to the transaction as opposed to the signer. That alternative didn’t come up in discussions but it seems like it would make multi-signature UX more complicated than it currently is. I’m not sure what the benefit would be other than simplicity for the ante handler implementation, but that complexity is already dealt with in the code.

not clear why we need to account for multisig directly in the ModeInfo - shouldn’t this somehow be covered by the public key (like before)? and related, does the bit array really need to be signed over? I don’t think it was before …

The amino MultiSignature struct doesn’t support sign modes. ModeInfo describes which sign mode each signer used in the BitArray. Without it, it’s hard to specify different sign modes for different signers which would create an undesirable UX. The BitArray maybe doesn’t need to be signed over but SIGN_MODE_DIRECT starts by being overly defensive. SIGN_MODE_DIRECT_AUX would relax that requirement and is maybe desirable for multi signatures. This is maybe a fruitful area for discussion as currently we are still doing multi-signatures in the CLI with Amino JSON because of this.

I’m also worried about this extension options thing - are we making our lives more difficult/dangerous by trying to support unknown fields in a consensus object? shouldn’t this be handled by explicitly breaking upgrades? at least we should provide more context on how this is expected to occur.

This didn’t actually get a ton of discussion, but my rationale for including it is that if we specify this as the “standardized” transaction format for Cosmos chains (which I believe is the intention) then we want to avoid scenarios where different zones feel that they need to fork because the upstream maintainers are slow to or don’t merge some functionality they need. I am holding the perspective that wallets, block explorers, etc. will want to support multiple chains and will want the easiest path to do so - which is to re-use formats that are commonly used. The whole motivation for switching to Any was to support this sort of thing in the ecosystem. So extension_options provides an escape hatch for chains to support the “standardized” transaction format while adding non-standard things that they need. There is a backlog issue for the ante handlers to reject transactions that contain unrecognized data there. That should effectively make use of those fields in chains that don’t support those options unsupported.

its unclear to me from the discussion if SIGN_MODE_DIRECT is really the ultimate preferred method - doesn’t this leave the sign bytes in a non human readable form? If we ultimately want a human readable form, do we need to also support this mode? Maybe we just stick with the SIGN_MODE_LEGACY_AMINO until we figure out SIGN_MODE_TEXTUAL and then we will have at most two sign modes at a time?

SIGN_MODE_DIRECT seems to be the ultimate method for 1) avoiding malleability loopholes and 2) easy client implementations. The easy client implementation part alone is in my mind a big win. Almost zero custom code other than a working protobuf implementation is required for clients to start signing transactions.

Just sticking with Amino JSON keeps us where we were before the protobuf migration but makes things even more complicated. Clients would need to understand the details of Amino JSON + Protobuf. I don’t sense anyone in our discussions had the appetite for that.

Also the fact that SIGN_MODE_TEXTUAL delegates malleability concerns to SIGN_MODE_DIRECT allows us to have multiple textual formats that don’t need to audited for malleability concerns or even cover 100% of the transaction.

starting the account sequence at 1 seems like it might break the semantics in a larger way. not sure if there’s a concrete problem here, but just flagging that it might cause issues elsewhere … might be good to avoid if possible (can we require default values not be omitted?)

This is maybe worth a larger discussion. I argued that clients can easily write a custom encoder for SignDoc to deal with any deviations from the specification.

can we summarize how/why JSON falls short of the ideal for human readable signing?

https://github.com/cosmos/cosmos-sdk/issues/6078#issuecomment-620171632 https://github.com/cosmos/cosmos-sdk/issues/6078#issuecomment-620206001 https://github.com/cosmos/cosmos-sdk/issues/6078#issuecomment-620932197

Basically JSON is only human readable for programmers. Therefore it is not human readable by most people on its own.

for SIGN_MODE_TEXTUAL, if the raw SignDoc is appended, how is the human supposed to interpret that?

Just base64 encode it and put it after the human readable content:

— BEGIN BASE64_DATA —
gkh368xgg89h2624sgh...

are there other ways to mitigate the malleability problems?

Of course, but the one that is easiest to reason about and probably make proofs about is simply signing the raw bytes.

If I understand correctly, its just the timestamp that opens malleability?

Omitting/not omitting default values also creates malleability surface area. It wouldn’t be recommended but if a developer were to use proto2 semantics where null and default/0 are different, then this could be an issue. Signing canonical protobuf binary requires default values to be omitted, but then they could be encoded with the default values included and apps could choose to handle that differently.

ignoring fields seems a bit terrifying. might be ok but probably warrants more consideration. ideally we could just ban it altogether? I would expect adding fields here to be a breaking upgrade to a chain, right?

The rationale for non-critical extension fields comes from experiences with TLS as explained by @tarcieri in https://github.com/cosmos/cosmos-sdk/issues/6078#issuecomment-623791421 and https://github.com/cosmos/cosmos-sdk/issues/6078#issuecomment-624400188. The non-critical fields are all fields with in the range of 1024-2047. A developer would need to make a special effort to create such a field so by default all fields are critical and thus breaking. Because of the rationale provided by @tarcieri which makes a convincing argument that allowing future developers an escape hatch to make something non-breaking addition, we included it in the spec.

I’m confused about the use case for SIGN_MODE_DIRECT_AUX … unless I’m mistaken, for multisig in the current SDK, signers don’t have to know which threshold of signers is going to sign. you can just take eg. whatever 3 of 5 signatures you get first and make a valid tx out of them. is this considered a malleability issue?

I believe this is addressed above. It is definitely a malleability surface area in the current SDK. Whether or not it’s a problem is another matter. It was argued quite extensively that this is a form of malleability that we can live with to support easier signature aggregation.

In the end, the participants in that discussion more or less agreed that really good multi-signature aggregation can only be supported through on-chain aggregation through something like the group module.

Nevertheless, SIGN_MODE_DIRECT does make the current multi-signature aggregation a bit more difficult by closing the malleability loop-hole where any 3 out of the 5 signers can get bundled into MultiSignature without them needing to know who else signed until the transaction gets broadcast. Maybe that is too cautious. The only issue that I think could occur is that gas estimation could be off and the transaction would unexpectedly fail because of that. So maybe something that allows a little bit of safe malleability like SIGN_MODE_DIRECT_AUX would be good to support for multi-signatures.

We should probably add more points to the “Negative” consequences section of the ADR. There’s a lot more complexity here that should be called out explicitly, especially the risks around multiple sign modes, ignored fields, etc.

Maybe… although I don’t actually feel like we added risks by including those things… maybe but the risks I think were addressed quite a bit in discussion. Maybe not. If anything I felt like the biggest sticking point was SIGN_MODE_DIRECT being too safe and risk averse regarding multi-signatures.

robert-zaremba commented 2 years ago

With new tasks and text mode working group, is this issue still relevant? What is the action point ?

tac0turtle commented 11 months ago

closing this, we will be evaluating which sign modes to remove with textual landing 0.47