cosmos / cosmos-sdk

:chains: A Framework for Building High Value Public Blockchains :sparkles:

https://cosmos.network/

Apache License 2.0

6.18k stars 3.57k forks source link

Protobuf tx UX is bad for multi-chain wallets/block explorers #6030

Closed aaronc closed 4 years ago

aaronc commented 4 years ago

Problem Definition

When I first ran a testnet for Regen Network, my team and I were able to customize the Lunie wallet and Big Dipper block explorer to support our testnet by just modifying a few parameters in configuration files. In spite of the UX challenges of the amino encoding, this was a fairly good experience.

The current protobuf tx design would make this much harder. Every app will define its own Transaction, Message, and SignDoc types and client applications will need to run code generation for every new chain they intend to support and write custom code to:

marshal module Msgs into the app Message
sign transactions using the app-specific SignDoc
marshal interface types into app-specific Msgs like MsgSubmitProposal

After having spent just a bit of time actually dogfooding our work, I find these tradeoffs unacceptable.

Proposal

I propose we leverage google.protobuf.Any to solve these issues. This can be done in a way that improves client developer UX without increasing encoding payload size, by separating signing messages from encoding messages.

google.protobuf.Any provides essentially the same UX as amino JSON for these use cases:

// amino json
{"type": "cosmos-sdk/Account", "value": { "address": "cosmos...", ... }}

// google.protobuf.Any
{"@type": "/cosmos_sdk.x.account.v1.Account", "value": { "address": "cosmos...", ... }}

One of the biggest concerns around using Any initially was that it increases stored message size by encoding the long URL. If, however, we just use Any for signing and a simplified client UX and still use oneofs for encoding, we don't have to accept a poor compromise between developer UX and performance.

A more detailed proposal will be submitted in the form of a PR.

For Admin Use

[ ] Not duplicate issue
[ ] Appropriate labels applied
[ ] Appropriate contributors tagged
[ ] Contributor assigned/self-assigned

alexanderbez commented 4 years ago

The proposal seems reasonable to me! I'm looking forward to seeing the revised ADR 👍

I'd like to clarify here two things:

This is solely for client-side UX and encoding (for signing)
The context stated in Problem Definition was already known before any of this work began.

aaronc commented 4 years ago

I see three different potential approaches with different tradeoffs. I think they're pretty simply classified based on their use of oneof vs Any for different things. Basically: 1) oneof everywhere, 2) oneof for encoding, Any for signing, 3) Any everywhere.

I've tried to list out the pros and cons as I see them.

1) `oneof` everywhere

The current approach in cosmos-sdk master which uses oneof's to encode all instances of interface types

Pros:

small message side (no overhead of URLs in Any)
clear communication of which interfaces are actually supported via the oneof
same message for signing as for tx encoding

Cons:

clients need to run code generation for every new app that they want to support and write adapter functions to handle:
- transaction generation
- signing
- any types that use interfaces (i.e. gov.MsgSubmitProposal will be slightly different for every chain)

I was happy with this approach until I took some time to explore the client-side proto experience and realized that the con above would create a worse developer experience than amino for wallets and block explorers that want to and currently do support multiple chains

2) `oneof` for encoding, `Any` for signing

The approach described in #6031 where Any is used for tx signing and also for tx broadcasting via an RPC service

Pros:

client apps with a set of module files can support any chains that support those modules without need for code generation or custom adapter functions (same as now with amino)
encoded message size is still small (because encoded tx's use oneofs instead of Any)
app-level proto files can still be used to figure out the set of interfaces supporting

Cons:

different types for signing and encoding so clients still need the app-level proto files if they want to
- do merkle state proofs
- calculate transaction IDs
- broadcast transactions to Tendermint (less dependencies, more entry points to the network)
- decode transactions from Tendermint (e.g. tx search and subscriptions)

3) `Any` everywhere

Use Any instead of oneofs for both signing and encoding. This is the approach @Liamsi and I advocate in https://github.com/tendermint/go-amino/issues/267 but it was discarded in favor of 1), to reduce encoded message size and also to more clearly communicate what interfaces are supported. We haven't been considering it recently, but since we're having this discussion, I think it deserves a mention.

Pros:

client apps with a set of module files can support any chains that support those modules without need for code generation or custom adapter functions (same as now with amino)
same message for signing as for tx encoding

Cons:

out of band information is needed to know which interfaces are supported by a given chain (same as now with amino). Note that this could be discoverable via some RPC service
larger encoded tx size, although this could be offset by rocksdb compression
tight coupling of state encoding and client API - chains need to break support of the client API if it doesn't work for their app, unlike 2)

6031 (approach 2) is my attempt to balance the our current approach 1) with what seems like a better client-side UX, but I could be wrong. Would appreciate the input of client side developers on this. @webmaster128 would love to hear your thoughts.

fedekunze commented 4 years ago

cc: @jbibla @faboweb

okwme commented 4 years ago

As a js & eth developer with only minimal experience on the go side and no experience with protobuf maybe it's possible to break down the 3 scenarios @aaronc outlined a little more?

Is it correct that the out-of-band information that is references are something like .proto files that describe the types included in custom Messages for each chain? This would be similar to the ABI file that describes the contract interface of EVM contracts, right?

It's annoying with Ethereum to find the ABIs in order to interact with different contracts, but it's not a deal breaker. There are some repositories of ABIs, and it's possible to confirm that an ABI does in fact satisfy the application you're trying to interact with once you have it. I saw mention that these out-of-bound files could be accessible by an RPC interface, this sounds like the proposal outlined here: https://github.com/cosmos/cosmos-sdk/issues/4322. That issue is maybe a good outline of what I'd like to have from the perspective of a client side developer (my user wants to interact with a new chain, i should be able to auto-generate an interface for that user with a minimal amount of contextual info on that chain)

aaronc commented 4 years ago

Is it correct that the out-of-band information that is references are something like .proto files that describe the types included in custom Messages for each chain? This would be similar to the ABI file that describes the contract interface of EVM contracts, right?

In all of these scenarios there will be .proto files that describe most of the ABI.

In approach 1), you will have almost all of the information in the .proto files.

In approach 3), you will need to know more about which chains support which types even though you'll have all the types sitting right in front of you. This could potentially be solved by a reflection interface that lets you query for the set of supporting types.

In approach 2), you can get the ease of 3) (in terms of being able to support multiple chains by just knowing part of their interface), but you can also look at the full app-level .proto files to get the benefits of 1).

Does that help @okwme?

iramiller commented 4 years ago

For approach 1 certain difficulties arise in that the Proto definitions are required to be compiled in and known to parties handling the message. A generic signature library becomes a problem as it must understand how to handle the Proto in order to process it.

I believe that approach 2 is suboptimal as a message should not be encoded in different ways for storage/signature calculation in my opinion. The signature process is used for integrity and repudiation. Maintaining the ability to verify a signature quickly without any further understanding of the message types used supports the point of using a signature in the first place. With this in mind translating a message into an alternate form prior to signing should be avoid if possible.

Using a hybrid approach to address short comings of oneof will persist those short comings in the remaining areas while adding on going developer friction and technical complexity.

Approach 3 appears to be the superior option. This approach allows the system components to evolve independently. Using the same solution everywhere streamlines development process and reduces the developer friction from working with the same data in multiple areas of the system.

aaronc commented 4 years ago

Thanks for your input @iramiller.

If we were going to consider approach 3, we'd need to asses a few things:

what is the real size/performance impact of Any vs oneof? I think we'd want benchmarks using rocksdb compression
is tight-coupling a concern? As I mentioned in the cons of 3), if a chain were going to opt out of the standard Any-based Tx/SignDoc types this would require them to have custom signing & tx client libraries anyway. That may or may not be an issue...

Would you agree that 2) is at least better than 1) if we choose not to rework our current encoding approach?

okwme commented 4 years ago

Just wrote out a scenario to try to help me wrap my head around the question. After doing so and reading this for more background, I realize this is much more of a question for client library authors and not client library users. I don't feel qualified on that front to weigh in since a client library user should not actually participate in these steps and that's the only experience I can maybe add perspective on. I've included the text that I began in case parts of it are useful in the conversation but I don't expect any response.

Thanks @aaronc this is very helpful! When you say most of the information, what type of info would be missing?

As a client developer trying to support a user who wants to query something on a new chain and then also let them execute a new Msg type on that new chain I'd need the following:

An accessible Tendermint node for that chain in order to query state and submit the signed transaction (or in order for me to create a light client and join the network itself). It would be expected that this comes directly from the user, or something like a service that keeps track of chains and their accessible tendermint endpoints.
Now I'd also need to have a .proto file for the Query the user wants to make and a .proto file for the Msg that they want to execute? or does a single .proto file encompass all of that for a single chain? I might request the .proto file from the user or I might be able to query that previously mentioned service. Or I might be able to query the original tendermint endpoint for the .proto file directly or in the form of a swagger file?
With the relevant .proto file I can see the required types of the data to be included in the query and generate a form for my user to input the relevant query data. I can create the query as a UTF8 string for the GET query with the relevant data encoded as the relevant types based on the .proto file. Now I can submit the query to the tendermint abci_query endpoint and will be able to convert the bytes I receive back into types that I can display to my user. (I may even request proof info to verify that response was in fact part of the state of the application.)
Now I want my user to be able to craft that custom Msg and sign it. Thanks to the .proto file I can generate a form with all the relevant types to collect the info from the user. They fill out the form and I serialize the types a a byte stream. I prompt my user to sign the bytes with their private key and they do so with their integrated system of choice, returning to me a signature and a public key. I can now submit this data as a JSON object to the tx endpoint of the tendermint node.

Is that an approximation of the process we're trying to solve for? This doesn't include access to a REST server, we're trying to avoid that scenario right?

The difference between Any and oneof is that the .proto file can be more flexible with the types and their order? Can someone give an example of a specific scenario of a Msg or Query where the client would treat these differently?

aaronc commented 4 years ago

Thanks @aaronc this is very helpful! When you say most of the information, what type of info would be missing?

Information about which concrete interface implementations are supported by a given chain. Ex. supported gov proposal types.

As a client developer trying to support a user who wants to query something on a new chain and then also let them execute a new Msg type on that new chain I'd need the following:

An accessible Tendermint node for that chain in order to query state and submit the signed transaction (or in order for me to create a light client and join the network itself). It would be expected that this comes directly from the user, or something like a service that keeps track of chains and their accessible tendermint endpoints.

Now I'd also need to have a .proto file for the Query the user wants to make and a .proto file for the Msg that they want to execute? or does a single .proto file encompass all of that for a single chain? I might request the .proto file from the user or I might be able to query that previously mentioned service. Or I might be able to query the original tendermint endpoint for the .proto file directly or in the form of a swagger file?

You would generally get these .proto files from a github repository for the chain/modules you want to query. With approaches 2) and 3), a single module .proto file + the base sdk .proto files would encapsulate all the info needed to query and make transactions against a given module on any chain supporting that module.

With the relevant .proto file I can see the required types of the data to be included in the query and generate a form for my user to input the relevant query data. I can create the query as a UTF8 string for the GET query with the relevant data encoded as the relevant types based on the .proto file. Now I can submit the query to the tendermint abci_query endpoint and will be able to convert the bytes I receive back into types that I can display to my user. (I may even request proof info to verify that response was in fact part of the state of the application.)

Reflection on the structure of the state store in order to request merkle proofs is not covered under any of the current work. It is something I do have in mind for later though. You could use the abci_query endpoint to make custom (non-merkle) queries using just the info in the module-level .proto file.

Now I want my user to be able to craft that custom Msg and sign it. Thanks to the .proto file I can generate a form with all the relevant types to collect the info from the user. They fill out the form and I serialize the types a a byte stream. I prompt my user to sign the bytes with their private key and they do so with their integrated system of choice, returning to me a signature and a public key. I can now submit this data as a JSON object to the tx endpoint of the tendermint node.

You wouldn't send JSON to the tendermint node, you would send protobuf or you would send JSON to an RPC server that then encodes it and broadcasts it to the tendermint node. Signing code be done just with JSON with the proposed approach.

Is that an approximation of the process we're trying to solve for? This doesn't include access to a REST server, we're trying to avoid that scenario right?

No, we aren't trying to avoid an RPC &/or REST server. Some of the approaches described above are very much enhanced by an RPC/REST server although much less dependent on one than the previous amino approach.

The difference between Any and oneof is that the .proto file can be more flexible with the types and their order? Can someone give an example of a specific scenario of a Msg or Query where the client would treat these differently?

The gov proposal types are a good concrete proposal types. With oneof you need an app-specific .proto file to create a gov proposal. With the Any approach, you just need the module-level .proto files and you need to know the type URL of the proposal type. We have both app-level and module-level .proto files in the current design. Module-level .proto files are shared by all apps using that module, but each app would implement its own app-level .proto files to define the concrete oneofs (unless we adopted approach 3).

webmaster128 commented 4 years ago

Thank you @aaronc for this very good summary.

(1) is what is done in weave-based blockchains, which works nicely if you target one chain. I see the drawbacks in this context.

(2) With two different encodings, I think it should be worked out in details, which mapping happens where and what information are required. E.g. the Any -> oneof mapping requires the App-level Message .proto, since each blockchain can assign different numbers to the supported message types. The other mapping oneof -> Any is required for signature verification. The Con point that you mentioned should be extended to

if they want to calculate transaction IDs
if they want to broadcast transactions to Tendermint (less dependencies, more entry points to the network)
if they want to decode transactions from Tendermint (e.g. tx search and subscriptions)

If I understand correctly, the con from (3) "out of band information is needed" applies for the Anys here as well, since a client does not know what makes sense to sign (which UI to present) without knowing the supported message types. But for some reason it is lised as a pro in this case.

One aspect that was not covered so far is that the Any type is a custom wrapper, not native to protobuf, and implies a two step encoding. This two step encoding or two step decoding needs to happen in both mappings Any <-> oneof. I'm not saying this is an expensive operation, but one that needs to happen for every signature verification.

(3) This is my preferred solution for a system with the given flexibility requirements, primarily due to the cons I see in 2. The simplicity seems much more resilient against transaction malleability bugs and is more egonomic for those needing both signing and encoding representation. It uses a bit more storage in the transaction history, but I did not yet see any evidence that the overhead is relevant. If someone comes up with long URLs, they just hurt their own chain in terms of storage.

iramiller commented 4 years ago

@aaronc - Would you agree that 2) is at least better than 1) if we choose not to rework our current encoding approach?

Option 2 is an absolute improvement for those working on the client side. Alignment of tx signing and also for tx broadcasting feels like a step in the right direction for signatures and security.

aaronc commented 4 years ago

(2) With two different encodings, I think it should be worked out in details, which mapping happens where and what information are required. E.g. the Any -> oneof mapping requires the App-level Message .proto, since each blockchain can assign different numbers to the supported message types. The other mapping oneof -> Any is required for signature verification. The Con point that you mentioned should be extended to

if they want to calculate transaction IDs

if they want to broadcast transactions to Tendermint (less dependencies, more entry points to the network)

if they want to decode transactions from Tendermint (e.g. tx search and subscriptions)

Add these to the cons.

If I understand correctly, the con from (3) "out of band information is needed" applies for the Anys here as well, since a client does not know what makes sense to sign (which UI to present) without knowing the supported message types. But for some reason it is lised as a pro in this case.

Well with 2) you could a) choose to use the app-level proto files or b) at least look at them. So I think it's at least mildly a pro. Although maybe a reflection service for 3) could accomplish something similar.

One aspect that was not covered so far is that the Any type is a custom wrapper, not native to protobuf, and implies a two step encoding. This two step encoding or two step decoding needs to happen in both mappings Any <-> oneof. I'm not saying this is an expensive operation, but one that needs to happen for every signature verification.

We're doing JSON encoding already for signature verification so it's expensive. But AFAIK we cache the result in CheckTx to avoid the performance hit.

(3) This is my preferred solution for a system with the given flexibility requirements, primarily due to the cons I see in 2. The simplicity seems much more resilient against transaction malleability bugs and is more egonomic for those needing both signing and encoding representation. It uses a bit more storage in the transaction history, but I did not yet see any evidence that the overhead is relevant. If someone comes up with long URLs, they just hurt their own chain in terms of storage.

Just to note that by default the URLs will be something like type.googleapis.com/cosmos_sdk.x.bank.v1.MsgSend. We can eliminate the type.googleapis.com part, but it would still be longish. I have a theory though that Rocksdb LZ4 dictionary compression would remove most of that overhead, but it's untested. Would love to see a benchmark or we'll do one at some point if there's enough community support for 3).

aaronc commented 4 years ago

Overall what I'm thinking at this point is that we should proceed with #6031 (approach 2) and at that point, option 3) is also enabled for chains if they choose it. Basically with #6031 the SDK will have the infrastructure to allow either 2) or 3) based on configuration. Then maybe it should be a decision for the hub - possibly via governance - whether users want 2) or 3). Other chains would be free to make a different choice either way.

ethanfrey commented 4 years ago

I just got to catch up on this thread. Here are my thoughts:

First, the framing of the issue is missing a bit. There are a few issues mixed together

(a) Compatibility of JSON-to-Protobuf encodings AND (b) (Re-)Use of Protobuf types

I will focus just on (b). I think we also need a solid cross-platform JSON-to-Protobuf solution, or decide that we do not need to sign this custom "canonical JSON" representation. But that is another issue I think.

First off, amino doesn't "just work". If my change uses a different Account type, all wallets and explorers will fail. If I use the same concrete types for the interfaces, then the binary representations are compatible. That is true. Assuming people develop wallets and explorer for the gaia hub, we get compatibility when:

(a) for wallets (sending tx) - if the target chain tx/msg types are a superset of the types in the hub (b) for explorers (parsing tx) - if the target chain tx/msg types are a subset of the types in the hub

They only work perfect when you have the same type definitions as hub. You know the issue that the custom regen governance votes don't show up in the explorers (as they are not a subset of the gaia/hub types.

This means, the two chains have slightly different type definitions for tx/msg (the oneof/interface types), but since there is significant overlap, much tooling (targeting that common ground) just works.

GIven the above, we get the exact same condition with protobuf oneof. It just takes a little thinking. This is how we did it in weave. In fact, most demos of weave had two chains - one was bnsd, which had the name service, governance, and some more modules; the other was bcpd, which had a much limited set of actions (bank, multisig, atomic swap more or less). We wrote client code and wallets that worked identically for both of them (you switch the url and the code just works). The reason was that we chose standard numbering for the oneof fields, such that we use the same numbers for all standard types, and thus if I encode a msg that falls in the intersection of the two tx types of the two chains, it is guaranteed to have the same representation with both codecs.

Please take a look at the comment here: https://github.com/iov-one/weave/blob/master/cmd/bnsd/app/codec.proto#L25-L35 We have since removed bcpd, but I know @alpe wrote his own weave app and it was compatible with the existing frontend tooling for all common msg types and queries.

Back in the day, we had to register go-wire types with a type byte, which is like this protobuf field. Now, this type prefix is autogenerated from a hash of the descriptor, so this overlap happens as long as you register them under the same name. This is fundamentally the same as using the same field numbers. If instead of cdc.RegisterConcrete(MsgSend{}, "cosmos-sdk/MsgSend", nil), I use cdc.RegisterConcrete(MsgSend{}, "regen/MsgSend", nil), then my amino types will also be incompatible.

I think this whole issue can be resolved very easily via (1) as long as we set a standard naming for all types exposed from the standard modules (cosmos-sdk and other common modules, like group account work). Custom chains can freely use fields from 250 or so to make sure there is no overlap, and there is some discussion to claim a field number below that (when it is mean to be reused between many apps and "standardized"), kind of like you need root to bind a port below 1024.

It just takes a bit of discipline and a canonical registry of types here. We will need the same discipline (or more) to maintain any kind of backwards-compatibility guarantees as well.

ethanfrey commented 4 years ago

Another point, tied more to the "compiling protobuf" for clients ux, is that the main js implementation (protobuf-js) works great dynamically generating types. Assume there is a JS lib that handles everything for you, and wants to be extensible.

The Dev building on it can just provide the chain-specific .proto file as well as a properly formatted JSON type. The lib can generate the protobuf classes dynamically and then instantiate the proper class from the JSON representation and send it. This means that a generic library can exist with full signing/sending/querying functionality, and the apps building on it just need to know the proper proto files and formats.

Anyway, this is another topic, but since "client side" is often another word for "JS", I want to say that dynamic languages can provide some very neat approaches to using protobuf.

aaronc commented 4 years ago

Thanks for your detailed comments @ethanfrey and for sharing another option for addressing this!

A schema registry could work and as you know is something I've advocated in the past. Having a blockchain registry of names to field numbers would be an interesting thing to explore, but I also think this could be started with a simple git repository with a master oneof file.

I wonder what others who advocated for 3) think about this option? @webmaster128 @iramiller ?

The lib can generate the protobuf classes dynamically and then instantiate the proper class from the JSON representation and send it

I'm not sure quite what you mean here @ethanfrey. Are you saying that protobuf-js can work extend protobuf functionality at runtime? Or just that a code generation step isn't needed?

There are a few issues mixed together

(a) Compatibility of JSON-to-Protobuf encodings AND

Since using JSON at all for signing has been questioned, it would be good to vet that as well. Either in this thread or another thread. @webmaster128 did have an interesting comment on discord that having the JSON signing is advantageous for pure-JSON users that don't want to depend on proto files at all.

I do want to note a couple other disadvantages to 3) that occur to me:

even if compression makes increased message side irrelevant, there still is an increased gas cost for all users. Now maybe gas prices will adjust accordingly, just noting
the way Any currently works in go is with a global type registry. This introduces an attack surface that one could argue also exists with amino but to a lesser extent because amino as its used in the SDK has explicit type registration. Basically with Any, some malicious library could registry a type that matches a known interface but that does malicious things. With 2) at least the oneofs at the encoding level provide a whitelist of acceptable concrete types. 3) could be made to avoid this problem with a custom Any type that uses controlled type registration, just noting that would be needed for security

anilcse commented 4 years ago

A schema registry could work and as you know is something I've advocated in the past. Having a blockchain registry of names to field numbers would be an interesting thing to explore, but I also think this could be started with a simple git repository with a master oneof file.

Yes, I think having a git repository with the filed numbers would be a good choice. I understand it's a bit of work for developers to follow these rules but having them would make the things easier for clients.

clevinson commented 4 years ago

Great to see the comments here. We just had a long conversation about this during Regen's Cosmos SDK architecture review call, with a number of folks from this thread on the call.

Links to the hackmd here.

Rough alignment from that call (as noted with individual votes from participants in the bottom of the doc), was to move forward with option 2, and possibly revisit the state encoding at a later point after we are able to run some benchmarking tests.

As I understood from the call, one of the biggest reason folks were not in favor of @ethanfrey's registry proposal had to do with the governance overhead of having a registry of message types for a fully decentralized ecosystem.

I'd like to give this thread still some opportunity to hear back from advocates for @ethanfrey's proposal, if there are clear client-UX reasons why this still should be heavily considered over Option 2.

aaronc commented 4 years ago

Thanks @clevinson. I just want to note that as reflected in your notes, there was almost a super-majority of people on the call (6/9) favoring or leaning towards option 3 (Any for state encoding), which surprised me. I do think benchmarking does need to be done to vet this for sure. I also do want to hear if there are more counter-opinions that would strongly advocate for option 1 with the oneof registry over 2 or 3.

ethanfrey commented 4 years ago

I advocate against 2. It seems way to complex.

Although I think 1 is the most efficient and organized version, and the idea of a registry is made to be much bigger than it is (just refer to the app protobuf definition in gaia as a basis), I have no desire to try to convince anyone. The other ones also need canonical names for the "any" variants, it just becomes less visible that a registry is useful until later on.

2 adds even more complexity - JSON plus 2 protobuf files. :exploding_head:

With 3 we end up with self-describing protobuf, which may be larger than (1) but smaller than the JSON representation, which is also self-describing. In such a case, I would see no need at all for using JSON for signing. (I do see the argument that you want a self-describing format that can be presented to the user if we encode it in approach 1). I see 3 as meaningful if developers:

a. want a similar ux to amino (dynamically registering types rather than define them in one file) and b. want a self-describing binary format

zmanian commented 4 years ago

I'm broadly in favor of a custom type -> JSON encoder that we sign and AMINO JSON is very much good enough. I've implemented this in rust, javascript, JAVA and more and it's fine.

Also many many more people are impacted by changes to Cosmos signing than are affected currently by bytestream changes and I would really hesitate to break this.

ethanfrey commented 4 years ago

I am pretty sure the JSON format has already changed @zmanian But you are right, this has a huge impact... like the ledger app :fearful:

I think they are using a canonicalized json representation of protobuf objects, not amino-json. But @alexanderbez or @aaronc can answer that better (just my understanding from a few conversations) The "amino JSON" encoder requires the custom type strings from go-amino, and unless they did some magic, those are likely gone in the protobuf world.

aaronc commented 4 years ago

I'm broadly in favor of a custom type -> JSON encoder that we sign and AMINO JSON is very much good enough. I've implemented this in rust, javascript, JAVA and more and it's fine.

Also many many more people are impacted by changes to Cosmos signing than are affected currently by bytestream changes and I would really hesitate to break this.

Just noting that a custom JSON encoder is not something under discussion here although I guess it could be. A canonical encoding of protobuf to JSON is what we are currently planning to use for signing, but it is avoiding being "custom" as much as possible and is not compatible with amino JSON. That said, approaches 2) and 3) would result in a JSON format that is much more similar to amino JSON than approach 1). It would vary from amino JSON in the subtle way described in the first comment in this thread (https://github.com/cosmos/cosmos-sdk/issues/6030#issue-603333931).

webmaster128 commented 4 years ago

I think we should move the JSON representation discussion to a different place and get clarity about what is signed first before it is discussed which representation is signed. (spoiler: I wonder to what degree the current proposal makes sense)

I wonder what others who advocated for 3) think about this option? @webmaster128 @iramiller ?

I was not aware of the solution that Ethan proposed for the multi-chain support. Be he is right and I support both (1) and to a slightly lesser degree (3).

My primary convern with (2) is that it is assumes the client does not need and does not get full control of the bytes exchanged with Tendermint. It optimizes for a script kiddie style client development and assumes a fully powered external instance will take care of the rest. If the client wants to get all the features and not a minimal subset, they need to re-implement a massive document manipulation process that is hard to get right even for experienced low level developers.

Instead, I think we should optimize for a client lib / client app developer split as Billy mentioned before. It is enough that there is one properly maintained core library per programming language. This is easy to get if the overhead in the spec is minimal. This also ensures we don't end up with a standard defined by messy Go internals (again). As a client lib developer, I want to be able to implement the full standard to interact with Tendermint (read/write) without knowing any Go.

My secondary concern is that when the bytes sent and the bytes signed differ, you open the door for transaction malleability bugs. I found one of those in Lisk, which I could have used to destroy 300 million market cap at that time. This is probably more dangerous for JSON/protobuf mappings than protobuf/protobuf mappings, but still. And even if it was secure, you still need to convince a lot of people that it is. Every audit of every implementation gets much harder and more expensive when there are multiple representations of a document.

Given those two concerns, I really would not mind a few percent of storage, if it ensures that the resuling chain is (a) used and (b) secure. Without the later, storage optimizations don't matter.

one of the biggest reason folks were not in favor of @ethanfrey's registry proposal had to do with the governance overhead of having a registry of message types for a fully decentralized ecosystem.

The are about 530 million protobuf field numbers. I don't see any significant governance in defining something like: the first 1 million are reserves for SDK core modules in this repo, the first 10 million are open for commonly used external community supported modules (e.g. CosmWasm), and everything above can be used by chains as they want

A chain can always fool a multi-chain client by using a different implementation for a given a common instruction that is binary compatible (e.g. replace send tokens with burn tokens). But then this is a shitty chain you should avoid, just like a chain that uses reserved field numbers for non-standard purposes.

I'm not sure quite what you mean here @ethanfrey. Are you saying that protobuf-js can work extend protobuf functionality at runtime? Or just that a code generation step isn't needed?

You can inject custom .protos into protobuf-js at runtime and get dynamically generated classes with encoders and decoders in JavaScript. In contrast to .protos known at library build time, you don't have the code generator that gives you TypeScript interfaces. But still pretty neat. This could be wrapped in a Cosmos specific lib that makes the API a bit nicer for the context.

webmaster128 commented 4 years ago

For (1): Given the 29 bit field numbers, you can also map the amino style {module_name}/{type} by using binary concatenated {module_id} | {lookup_m(type)} where module_id is a 21 bit module ID and lookup_m is a module specific lookup for the types in module m that produces a 8 bit output. This reduces the coordination effort to a per-module basis. Once there is agreement on module ID's, the module can use their 2^8=256 possible types for whatever they want.

aaronc commented 4 years ago

My primary convern with (2) is that it is assumes the client does not need and does not get full control of the bytes exchanged with Tendermint. It optimizes for a script kiddie style client development and assumes a fully powered external instance will take care of the rest. If the client wants to get all the features and not a minimal subset, they need to re-implement a massive document manipulation process that is hard to get right even for experienced low level developers.

This is a fair ask and likely something we should consider more seriously.

My secondary concern is that when the bytes sent and the bytes signed differ, you open the door for transaction malleability bugs. I found one of those in Lisk, which I could have used to destroy 300 million market cap at that time. This is probably more dangerous for JSON/protobuf mappings than protobuf/protobuf mappings, but still. And even if it was secure, you still need to convince a lot of people that it is. Every audit of every implementation gets much harder and more expensive when there are multiple representations of a document.

I tend to think of protobuf to JSON as more secure that just signing with protobuf as the JSON includes more information. Nobody ever proposed mapping JSON to protobuf for signing, but that seems like similar to what is actually happened with the Lisk bug. A high resolution format was reduced to a lower resolution format for signing. With protobuf to JSON it is generally the opposite. The JSON representation includes more information than that protobuf representation - specifically field and sometimes type names - and I think this is the general argument for doing JSON signing. Now, theoretically there could be some types of data which are higher resolution in protobuf than their JSON representation, but it should be pretty straightforward to figure out which ones those are. Taking a quick look, it does appear that Timestamp.nanos does contain more resolution (int32 - max 10 digits) in protobuf than JSON (max 9 digits). I don't know if that malleability could ever be exploited, but I'll take your point as proven there. It can be mitigated, of course, we can audit the JSON and tweak the spec to not allow that malleability for Timestamp, but you are right that it needs to be audited and proven.

A chain can always fool a multi-chain client by using a different implementation for a given a common instruction that is binary compatible (e.g. replace send tokens with burn tokens). But then this is a shitty chain you should avoid, just like a chain that uses reserved field numbers for non-standard purposes.

This is the scenario I'm actually more concerned about. And not that a chain would intentionally fool clients but that simple user error could cause this to happen. Consider we have two messages MsgMint and MsgBurn that have identical fields (say just signer and coins). Now say this is what's in the big master oneof registry:

oneof sum {
   ...
   MsgMint mint = 10315;
   MsgBurn burn = 10316;
}

Now say a developer is copying this to the chain proto file (or alternatively to a client app proto file) and they write:

oneof sum {
   ...
    MsgBurn burn = 10315;
    MsgMint mint = 10316;
}

Maybe there's no malicious intent just user error and then the chain is mostly compatible with standard wallets with one important exception! Just signing the proto binary with approach 1) would not prevent this, but signing the JSON would (because burn and mint would show up in the sign JSON).

However, JSON wouldn't fix this sort of screw up:

oneof sum {
   ...
    MsgBurn mint = 10315;
    MsgMint burn = 10316;
}

But, approach 3) would because the actual type name MsgBurn or MsgMint would end up in both the proto binary and JSON serialization.

So, if security is our number 1 concern, so far approach 3) seems the most secure - the least amount of manual user coordination with the highest resolution information.

zmanian commented 4 years ago

The reason we sign JSON encodings of transactions rather than the bytes sent over the wire is that we prioritize the ability of humans to inspect what they are signing over the ease of confusion attacks.

The strongly typed nature of AMINO JSON by reflecting over the paths has been a pretty robust defense against type confusion/ malleability attacks.

As long as we retain this property in what we sign we should be fine.

aaronc commented 4 years ago

The strongly typed nature of AMINO JSON by reflecting over the paths has been a pretty robust defense against type confusion/ malleability attacks.

Approach 3) would provide identical or almost identical guarantees to amino JSON

Approach 1) (our current approach) + JSON, would provide almost the same but I think slightly weaker guarantees, as per my comment above about messing up field ordering/naming.

iramiller commented 4 years ago

A streamlined implementation focused on signing over the bytes of the message as constructed by the client precisely and the routing to the appropriate handler is the ideal goal for the SDK.

Approach 3 aligns very will with the separation of concerns outlined in the system architecture where the SDK is described as a router that takes messages and directs them to the module registered as capable of processing them. The type identifier in the any payload being in a wrapper as part of a two stage encoding process supports the SDK in fulfilling this narrow responsibility. The SDK can determine the message is complete and correct as certified by the sender through the signature process and route to the destination with no further constraints on the payload itself.

The use of JSON encoding as a projection/interpretation of the message contents is not an action the SDK should concern itself with. Indeed this extra interpretation step is an attack surface with grave consequences in terms of transaction malleability as @webmaster128 points out.

The client is responsible for determining the best approach for conveying the contents of the message to the end user in such a way that they can certify their intent with a signature. The SDK should not attempt to deem what is correct or incorrect in this process. At the highest levels JSON is never presented to non-technical users for review when they make decisions. A typical application provides a user friendly representation via graphical interface and subsequently transforms this into some form of optimized sequence of bytes for programs to operate over.

I believe that option 3 + raw signing alone is aligned with the pure intent of the SDK. Option one and two both cause compromises and constraints to be internalized into the SDK itself. It is better to leave the type registration and mapping to users of the SDK (vs option 1) and avoid bloat and duplication of implementation (technical debt) as option 2 advocates.

aaronc commented 4 years ago

Does anyone have concerns about approach 3 besides message size/performance?

I have run some contrived benchmarks to assess the impact of Any on both performance and disk usage. In my scenarios with random data, the type URL for Any takes up ~47% of the total message size which relatively "bad" scenario. The performance overhead for round trip encoding is ~13%. Disk usage with leveldb is only ~20% more, so even leveldb's basic compression does something. In a real scenario I would expect the type URL to take up max maybe 15-20% of a tx (consider the size of signatures and pubkeys). So we should be seeing a much smaller hit (maybe 5-6% performance, 5-10% disk usage??). Furthermore as @iramiller mentioned on Discord, since we know exactly what type of data we're trying to compress, adding a custom compressor in the stack is totally feasible.

So all that's to say that if performance is the main concern, a) I'm not too alarmed by my preliminary benchmarks and b) there are optimizations we can introduce later to address size concerns.

With all of that, I'm going to agree with @iramiller and @jackzampolin in advocating approach 3).

I do also want to note that even if @ethanfrey and @webmaster128's approaches for assigning field numbers are feasible, there is a non-trivial developer impact in having to deal with master proto files that all have slightly different combinations of supported msg's and interface oneof numbers. How will this work for creating generic signing libraries in statically typed languages? Which master proto file will they reference for their tx? I'm not saying it's impossible using things like traits, interfaces, etc. But something to consider.

webmaster128 commented 4 years ago

Nobody ever proposed mapping JSON to protobuf for signing, but that seems like similar to what is actually happened with the Lisk bug. A high resolution format was reduced to a lower resolution format for signing.

I don't want to say we risk need to fear a similar issue (which was specific to their address format). The only point is: as soon as there are multiple representation, you make this class of problems more likely. This is not black or white: 2 representaions are safer than 3. And 1 representation safer than 2. It also matters how close the representations are.

I'm glad the 3 is off the table now and only 2 (proto/JSON) and 1 (proto) is left. As @iramiller explained very well, this should be open for debate now.

but that simple user error could cause this to happen.

Very good point. I thought about this exact binary compatible signature for a different action as well, but then put into the shitty chains class. But you are right, this is easy to get wrong accidentally.

I support both (1) and (3) and am happy to see the signaling towards (3) from many different people.

Both (1) and (3) make it simple to generate the real stuff for Tendermint locally, using an existing protobuf lib plus a few documented normalizations. This means every client can submit the protobuf serialization bit by bit and a JSON->proto decoding is never needed.

Next up: how to sign? I think this question deserves a new ticket.

aaronc commented 4 years ago

As a result of this discussion, the SDK team is proceeding with approach (3) (using Any for transaction encoding).

As requested, here is an issue for discussing transaction signing alternatives: #6078

aaronc commented 4 years ago

@alexanderbez just double-checking, are we in agreement on migrating state encoding to Any as well? I think it's nice that we have the Codec interfaces to allow SDK users the choice (they could choose amino for instance) and I think we should keep that option open. But choosing the same default of Any for state and tx seems to make sense. What do you think?

alexanderbez commented 4 years ago

Yes, I'm in favor of keeping the codec interfaces where we opt for any instead, assuming, to reiterate @ethanfrey's point, it just works "out of the box" and clients don't have to tinker or tweak things to get client-side compatibility. Otherwise, we end up back with an Amino-like UX.

I'd also remove all the types and other unnecessary boilerplate that we needed due to oneof.

aaronc commented 4 years ago

Okay, my init PR will have both state and msg types for x/evidence (#6076). Also planning to provide updates to ADR's 019 and 020 in #6031.

clevinson commented 4 years ago

@alexanderbez just double-checking, are we in agreement on migrating state encoding to Any as well? I think it's nice that we have the Codec interfaces to allow SDK users the choice (they could choose amino for instance) and I think we should keep that option open. But choosing the same default of Any for state and tx seems to make sense. What do you think?

This is how I think we should proceed as well. It seemed like on our call last week most folks were in favor for using Any in both state & transactions encoding, but saw it as a stretch goal. In my understanding, the conversations in this thread since that call have shown hesitation with having state & transaction encoding diverge. So aligning both on Any seems to be the favored strategy.

aaronc commented 4 years ago

Would appreciate a review of #6081, which describes our proposed usage of Any, from anyone that participated in this discussion. @ethanfrey @iramiller @webmaster128 @clevinson @zmanian @okwme @anilCSE

aaronc commented 4 years ago

Addressed in #6081 and #6111

cosmos / cosmos-sdk

Protobuf tx UX is bad for multi-chain wallets/block explorers #6030

Problem Definition

Proposal

For Admin Use

1) oneof everywhere

2) oneof for encoding, Any for signing

3) Any everywhere

6031 (approach 2) is my attempt to balance the our current approach 1) with what seems like a better client-side UX, but I could be wrong. Would appreciate the input of client side developers on this. @webmaster128 would love to hear your thoughts.

1) `oneof` everywhere

2) `oneof` for encoding, `Any` for signing

3) `Any` everywhere