Specify SIGN_MODE_TEXTUAL

aaronc commented 4 years ago

As documented in ADR 020 and discussed in #6078. Let's slate this for the 0.40 release and start a thorough process shortly after 0.39 is wrapped up.

/cc @zmanian @jleni @tarcieri

github-actions[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

aaronc commented 3 years ago

Let's get this discussion started. There are two general approaches that I'm thinking of.

JSON/YAML

Basically we take the proto3 JSON serialization (possibly with some additional constraints) and make sure it's serialized canonically with http://gibson042.github.io/canonicaljson-spec/.

If we go with this approach, we should consider using a canonical YAML serialization instead as that would be somewhat more human readable which is the goal here.

String formatting

Taking inspiration from ICU MessageFormat, we could embed format strings in .proto files like:

message MsgSend {
  option (cosmos.format_string) = "Send {amount, coins} from {from, address} to {to, address}";
}

And then clients would sign formatted text that looks like:

Pay a 0.001 atom fee with 100,000 gas
Send 10 atoms from cosmossdn248hsdgsdg to cosmossdgheg8hsdgoet3

The text for each message could be localized to different languages using standard translation bundles.

Neither of these approaches is mutually exclusive, i.e. we could support both.

Also I would note that we could deal with malleability as a secondary concern by appending these messages with base64 proto bytes from SIGN_MODE_DIRECT.

I want to personally express a preference for the string formatting approach. Having studied the ledger app source code a bit, it’s going through a fair bit of effort to extract fields out of JSON structures. Message names and fields are generally hard-coded, and as more and more message types get added the ledger app will be unable to keep up and just default to showing raw JSON for lots of message types which is not ideal. With the string formatting approach, the protocol would define a human readable format and the ledger app has a much simpler task of parsing human readable lines of text. New message types and localized translations can be added without needing any changes to the ledger app. It’s actually hard to imagine how we would scale the ledger app to support all that without this approach.

So I really think string formatting is the future of SIGN_MODE_TEXTUAL . JSON is easier to do and if it’s needed as a stop gap measure, we could add support. But it really feels like human readable strings at the protocol layer are the future if we want first class support for hardware signing. I just don’t see a scalable way to push this responsibility into small embedded firmware written in C.

Thoughts @zmanian @jleni ?

zmanian commented 3 years ago

I think we should be deprecating JSON.

I support the string formatting proposal.

We should have a list of primitive types that allowed to be referenced in the formatting strings.

There should be the option of supporting multiple formatting strings per message. I expect this to be used to internationalization.

There should be a character limit on formatting strings.

zmanian commented 3 years ago

This seems like a great direction and we should figure what we need to do support this across out ecosystem.

jleni commented 3 years ago

Deprecating JSON/YAML is good, those are mostly "web" formats that allow nesting, are complicated without dynamic memory and are hard to define canonically.

With respect to hardcoding, the only aspect that has been hardcoded are some lookup tables used to convert long confusing keys into human readable text descriptions (e.g. "msgs/value/validator_src_address" => "Validator Source"). As cosmos evolved, these less readable names started to creep in and we had to find a workaround. I see how this can happen even with the good intentions behind the string format... once people start using "dev-speak" in the strings, they may get hard to read anyway.

Overall, I am not sure yet about this string formatting concept. It may not be easy to parse at all and unless it is very well defined. This requires working on a spec for something new.. and to be fair, there are plenty of good standard encoding options. Given only a string like the example, unless we just take it AS IS and paginate it... then parsing will be hard and it will require plenty of hardcoded "chunks", etc. These plain strings create lots of opportunities for mistakes (or even exploits) once other developers start extending the protocol beyond Cosmos.

If the redesign is focused on small and/or embedded devices, then these long strings descriptions (as per the example) are actually not useful or convenient. Screens are not big enough so information needs to be split and navigated. A long string being paginated could be automatically cut off at inconvenient points, etc.

If we are thinking of a redesign, personally, I would prefer a well defined flat array of key/value pairs where keys are human readable and values comes pre-formatted. How this array is provided is not really critical, however, something like CBOR works well and is easy to handle.

zmanian commented 3 years ago

The intention of SIGN_MODE_TEXTUAL is that the hardware device will not need to parse the string provided at all.

The hardware device just needs to display the string, sign the string + additional data.

The correctness of the string as a representation of the additional data will be enforced by the chain.

aaronc commented 3 years ago

I think we need to discuss this on an architecture call with both of you present (and other key stakeholders) with some hypothetical examples worked out. @clevinson can you put it on the schedule sometime soon?

aaronc commented 3 years ago

We just discussed this on our architecture call with @jleni and @gagbo, notes here: https://hackmd.io/CIusfoNYS2u99zBMm-8LNA

It seems like it may be valuable to start with adding a protobuf JSON sign mode following https://github.com/Zondax/ledger-cosmos/blob/master/docs/TXSPEC.md as closely as possible, and this doesn't prevent us from exploring a string formatting approach later down the line. If people are aligned with the JSON approach as a starting point, I think it's relatively straightforward and I can propose something more concrete. @zmanian @alessio @alexanderbez

One important consideration is that the ledger screen only supports ASCII characters 32-127... so localized text strings in other languages will be cumbersome.

aaronc commented 3 years ago

I created a proto JSON SignDocTextual PR here: https://github.com/cosmos/cosmos-sdk/pull/8476. Would be great if everyone here could review and let me know your thoughts. Even though this approach may not be the ultimate ideal approach, it may be the simplest for now and we can probably get this into the SDK relatively quickly.

zmanian commented 3 years ago

A design with similar goal from Eth land

https://github.com/ethereum/EIPs/blob/b92a65e0473f4c1e9118c74ed069c1040721bc4e/EIPS/eip-3224.md

aaronc commented 3 years ago

It appears that we are going the JSON route as indicated in #8476. This will require us canonicalizing the JSON as specified in https://github.com/regen-network/canonical-proto3/blob/master/README.md#json.

The biggest gotcha we need to be aware of is I believe the non-determinism introduced by how proto3 serializes Timestamp and Duration with 0, 3, 6 or 9 digits which is implementation specific. This small gotcha may require a custom reflection-based JSON serializer for any implementations. That's a fairly unfortunate headache for such a small part of the spec, but we do need to be aware of it.

alexanderbez commented 3 years ago

Can we not just enforce all timestamps to be unix nano representations, i.e. int64?

aaronc commented 3 years ago

Can we not just enforce all timestamps to be unix nano representations, i.e. int64?

The downside of this is that they aren't really human readable. In my personal experience this is pretty stressful when I'm reviewing genesis.json files. And if I'm signing a transaction on a Ledger I would much rather see something in RFC3339 than just an int64.

How about this as an alternative to a custom protobuf JSON serializer: when we are canonicalizing the JSON, whenever there is a string value that matches the regex for Timestamp or Duration, we re-encode the string to have 9 fractional digits if it doesn't already. Clients will need to have the JSON canonicalization layer anyway, so we're just saying that we add an extra two rules. This seems much simpler than a reflection-based JSON serializer just to handle this. The downside is that there could be some weird edge cases where the regex matches a value which isn't a Timestamp or Duration protobuf type and incorrectly re-encodes it. I'm not sure if this is enough of a concern to block this approach.

Let's discuss on the next architecture call.

alessio commented 3 years ago

The downside is that there could be some weird edge cases where the regex matches a value which isn't a Timestamp or Duration protobuf type and incorrectly re-encodes it. I'm not sure if this is enough of a concern to block this approach.

Well, it sounds like a blocker to me IMvHO

aaronc commented 3 years ago

The downside is that there could be some weird edge cases where the regex matches a value which isn't a Timestamp or Duration protobuf type and incorrectly re-encodes it. I'm not sure if this is enough of a concern to block this approach.

Well, it sounds like a blocker to me IMvHO

Well it was worth a try at least lol 🤷 !

If we do prohibit Duration and Timestamp how do we actually do that? Just by documentation like how we tell people not to use maps? Ideally there should be some static analysis that prevents this as @ebuchman hinted at early on.

Also, if we are avoiding Duration and Timestamp, my preference is actually to use ISO-8601 strings as opposed to cryptic int64s. This is in line with our decision to use string over bytes for addresses - prioritizing readability and foolproof-ness for users over performance.

I still think a custom protobuf JSON layer is not be out of the question. Forcing that upon clients is obviously not my favorite solution... but let's weight the pros and cons.

aaronc commented 3 years ago

In the last architecture call, we agreed that custom deterministic proto JSON serializer is essentially inevitable. It's fine if it works off of reflection.

Just as a note we do have a custom reflection based protobuf deserializer for rejecting unknown fields and it isn't horribly complex. Also, clients could use an RPC endpoint if they don't have deterministic JSON as a stop gap measure. This should be reasonably secure because SignDocTextual will include the protobuf binary and signers should be verifying signing text by hand.

Closing this issue and opening another issue for implementation.

aaronc commented 3 years ago

I want to re-open this discussion.

Previously @zmanian and I had been in alignment that the textual sign mode should be something that is actually human readable, i.e. format strings in the user's language. Also, we are not the only ones to propose such a thing: https://github.com/ethereum/EIPs/blob/b92a65e0473f4c1e9118c74ed069c1040721bc4e/EIPS/eip-3224.md.

When we discussed with @jleni and the ledger team, we ended up deciding to just use protobuf JSON because a) that seemed simplier both on the ledger and SDK side and b) concerns about developers not writing good human readable format strings.

Regarding a), it does not actually look like this is simple for the SDK, CosmJs or the ledger app. For the SDK and CosmJs (cc @webmaster128), we need to write proper canonical proto JSON serializers which adhere to a number of rules beyond the official proto3 spec. For the ledger app, 3 JSON sign modes will need to be supported with slightly different structures (legacy amino JSON, proto JSON, & proto JSON aux for #9406).

With that, we will not get much better UX than we currently have. We will actually need to show users longer Msg names (cosmos.bank.v1beta1.MsgSend vs cosmos-sdk/MsgSend) and coins other than atom will still show up as long integers in an unexpected denom with no commas (ex. 10000000000uregen for 10,000 regen). The idea with sign mode textual was to actually provide a better user experience for ledger signers. With things like DenomMetadata which we now have on-chain, we know what the rules are for displaying uatom and uregen tokens are. The are a bunch of other things we can likely do longer term, for instance better display of IBC denoms.

We can always do proto JSON signing first and then add textual signing later, but given that proto JSON signing is actually a lot of work for three teams, I want to invite us to consider whether we're maybe better off scraping proto JSON and just doing sign mode textual like we originally intended. My fear is that proto JSON will have felt like such an effort for so little pay off that nobody will have much enthusiasm to do the work for actual textual signing.

I know @jleni has concerns that format strings will be poorly implemented by developers and this is a realistic concern, but also something that can be solved by proper QA on the part of the SDK team and teams building on the SDK. We can probably deal with most concerns with proper guidelines and automated linting in CI. For instance, we could enforce rules such as the following:

maximum line length for individual output lines
proper character set for ledger output
messages must be bijective (i.e. I can at least theoretically parse back all the data from the textual representation so it is lossless)
spelling and grammar, something like this could even be run in CI with https://github.com/languagetool-org/languagetool

So far we have started working on the proto JSON signing, but I think we are not too far along to change course. I obviously want to get this done soon so the specification would need to happen quickly, but happy to lean in there to get something formalized ASAP. My gut feeling looking at all this is we're going to feel better about what we end up with if we spend 2-3 months on the textual signing than proto JSON, but would love to hear what others think.

aaronc commented 3 years ago

@hxrts as I mentioned today, I also see this relevant to gov UX in the sense that if we had textual descriptions of all messages as part of the protocol, that would automatically give us a human readable way to describe what a user is voting on.

zmanian commented 3 years ago

I don't think it's a requirement for the messages to be bijective.

I think sign mode textual is a fantastic vision and would be a massive improvement over the status quo of custody which is terrible.

I wasn't willing to fight a desire to just do a new protobuf but I am willing to support a new protobuf work

webmaster128 commented 3 years ago

Excellent analysis, @aaronc and I couln't agree more.

One more note: we all need to be prepared to support Amino JSON much longer than originally anticipated last year. We already do the heavy lifiting in CosmJS now. So I don't see the point of rushing for something new we are not fully comfortable with.

fdymylja commented 3 years ago

I agree with @aaronc concerns. Specifically regarding sdk proto JSON representation. I don't think it's that much user friendly, surely for a developer is easy to understand a JSON message, but even if we assume it's going to be outputted as a single line, it will be hard to read and understand.

(Oops, I didn't see aaron proposed a similar approach!)

I'm more in support of implementing sign mode textual, even if initially it will not be perfect I think it will give better user experience than jsonpb. Also since we use protobuf for everything and it's the way we share chain API specifications, we could use them to represent how sign_mode_textual signers must represent the message by having a custom cosmos-sdk protobuf option:

message MsgSendCoins {
  option (cosmos.sign_mode_textual) = {
  format: "sending coins from account (1) to account (2), total: (3)",
 };

  string from_address = 1;
  string to_address = 2;
  repeated Coin amount = 3;
}

message Coin {
  option (cosmos.sign_mode_textual) = {
  format: "(1)(2)"
  };

  string amount = 1;
  string denom = 2;
}

If we used protobuf files, the format can be shared and be canonical because defined in pb files and easily implemented by different languages (especially those who have protoreflection, it should be super straightforward).

There would still be a problem with for example Coin which needs to convert uatom to atom and give better integer representation. In this case maybe the first solution that would come to my mind is to have cosmos known/types as done in protobuf which have custom textual representation which is an exception to the cosmos.sign_mode_textual format option. Another solution would be to have some way to represent programatic stringification formats as protobuf options but it would be complex and maybe could be thought about later.

fdymylja commented 3 years ago

The downside is that there could be some weird edge cases where the regex matches a value which isn't a Timestamp or Duration protobuf type and incorrectly re-encodes it. I'm not sure if this is enough of a concern to block this approach.

Well, it sounds like a blocker to me IMvHO

Well it was worth a try at least lol 🤷 !

If we do prohibit Duration and Timestamp how do we actually do that? Just by documentation like how we tell people not to use maps? Ideally there should be some static analysis that prevents this as @ebuchman hinted at early on.

Also, if we are avoiding Duration and Timestamp, my preference is actually to use ISO-8601 strings as opposed to cryptic int64s. This is in line with our decision to use string over bytes for addresses - prioritizing readability and foolproof-ness for users over performance.

I still think a custom protobuf JSON layer is not be out of the question. Forcing that upon clients is obviously not my favorite solution... but let's weight the pros and cons.

Assuming this gets implemented in a protov2 environment, we can easily check at chain level which messages contain known types that need to be parsed in a different way (ex: Duration, Timestamp, cosmos "wellknowns")

aaronc commented 3 years ago

Thanks for commenting @zmanian and @webmaster128! And thanks for writing up those ideas @fdymylja 🙏

I think there a few different design areas to think about. I'll break them up here:

Format String Storage

So my initial thought was to use a protobuf option for the textual format string. The downside I think might be if we want to have different versions of the format strings independent of the proto files. In that case, maybe having the format strings in separate files makes sense. Then we could have sub-versions, i.e. tx.v1.textual.yaml, etc. I think ultimately what's needed for Javascript clients is a map from type URL -> format string, so however they're represented on disk, we can bundle up all the format strings for a given version in a single JSON file that gets used on the client side.

Template Language

I've been thinking that something like mustache, might be a good fit. I spent a bit of time brainstorming what the strings might look like for different Msg types based on a mustache model: https://hackmd.io/fsZAO-TfT0CKmLDtfMcKeA?both.

This approach assumes that we have some well-known types that are pre-formatted like you're suggesting @fdymylja. I'm thinking we'd start with cosmos scalars (#9694) and proto well know types, but also cover special cases like repeated Coin.

We could also take an approach where formatting needs to be explicit and use a slightly more complex templating language like https://handlebarsjs.com. Ex:

cosmos.bank.v1beta1.MsgSend:
Send {{coins amount}} from {{from}} to {{to}}

where coins is a custom helper.

I think the advantage of pre-formatted types is that it makes templates easier to write. The advantage of explicit helpers is that it maybe allows for more advanced versioning of helper methods. i.e. say we had a v2 format string that uses a more advanced coins format (say that makes IBC coins look nicer), v1 could still use {{coins amount}} which is needed for older chains and newer chains could use {{coins2 amount}} and the javascript library would do the right thing for the right version depending on the helper function.

Also, I would note that I don't think there's any particular reason to prefer a minimalistic template language like mustache over something more complex like handlebars as long as they're reasonable well defined and platform independent.

aaronc commented 3 years ago

So I've fleshed out some ideas in https://hackmd.io/fsZAO-TfT0CKmLDtfMcKeA a bit more.

Here's my rough proposal on the syntax:

use mustache with { and } delimiters as default instead of the more verbose {{ and }}
pre-render values based on specific rules (proposed here)

In terms of versioning:

version a set of messages + its rendering rules as a "template set" which can be described by a single JSON file - a chain may support one or more template sets for its messages, some of these could be localized
the reference to a template set gets hashed on-chain and referenced as part of ModeInfo.Single in AuthInfo

Would love feedback on whether this direction makes sense. If so, I believe we need to present it to Ledger somewhat formally to see if we can get their approval. At least that's my understanding from chats with @jleni.

fdymylja commented 3 years ago

So I've fleshed out some ideas in https://hackmd.io/fsZAO-TfT0CKmLDtfMcKeA a bit more.

Here's my rough proposal on the syntax:

use mustache with { and } delimiters as default instead of the more verbose {{ and }}

pre-render values based on specific rules (proposed here)

In terms of versioning:

version a set of messages + its rendering rules as a "template set" which can be described by a single JSON file - a chain may support one or more template sets for its messages, some of these could be localized

the reference to a template set gets hashed on-chain and referenced as part of ModeInfo.Single in AuthInfo

Would love feedback on whether this direction makes sense. If so, I believe we need to present it to Ledger somewhat formally to see if we can get their approval. At least that's my understanding from chats with @jleni.

To me everything sounds good, I have one question: do we plan to use an existing library/framework to parse format strings or do we plan to create our own? I'd vote to use something that exists, because creating our own template language would require some time and a lot of care and audits most likely.

The downside I think might be if we want to have different versions of the format strings independent of the proto files. In that case, maybe having the format strings in separate files makes sense

I don't want to be too annoying regarding this, but I'd really like for us to explore a way in which we can make this fit inside protobuf files instead of having some other file/format to share this information with consumers. Reasons being the following:

For me sign_mode_textual is just a way to encode messages, protobuf files represent this: message specification and how to encode it.
It would keep protobuf file the single source of truth for chain specifications
Information defined in PB files is automatically part of the code, because types hold descriptors information (and hence sign_mode_textual formats). If we were to use for example json files/yamls etc, we would need to find a way to bundle them into node code too.
We already have the reflection service, and consumers can use the reflection service to create chain agnostic and dynamic sign_mode_textual signers/verifiers.

Regarding the concern of format versioning vs protobuf versioning it can be handled in the following way:

// NOTE this definition is part of cosmos.proto
extend google.protobuf.MessageOptions {
  repeated SignModeTextual sign_mode_textual_formats = 1110000;
}
message SignModeTextual {
  string version = 1;
  string parser = 2;
  string format = 3;
  // blablabla
}

// This is part of bank.proto
message MsgSendCoins {
  option (sign_mode_textual_formats) = {
    version: "v1",
    parser: "mustache",
    format: "Send {amount} from {from_address} to {to_address}",
  };

  option (sign_mode_textual_formats)  = {
    version: "v2",
    parser: "custom",
    format: "You will send {amount} to {to_address} from your account {from_address} "
  };

  string from_address = 1;
  string to_address = 2;
  repeated Coin amount = 3;
}

This changes nothing from the proposal you've made above, aside from putting this information into protobuf files, which are something that devs already understand, know how to share and handle.

The only downside I can see from this: if we add formats we would most likely need to bump pb files versions too (or maybe not since message spec is not being broken?).

Although I prefer to deal with versioning one thing than two, but this is just my personal preference.

aaronc commented 3 years ago

To me everything sounds good, I have one question: do we plan to use an existing library/framework to parse format strings or do we plan to create our own? I'd vote to use something that exists, because creating our own template language would require some time and a lot of care and audits most likely.

I think we can start with existing mustache libraries. We just need to instruct it to switch delimiters with a preamble I think ({{= {} =}})

Regarding versioning of value rendering, that actually should be one version per chain I think so maybe it isn't even tied to an individual message set. I think the big issue is how to coordinate between client and server versions as well specifying the version in ModeInfo.

I don't really have a strong preference as to whether format strings are in proto files or some other file (I was thinking using go embed with a dir of .yaml or .mustache files). Possibly both could be supported? One question I have is how does protobuf deal with multi-line strings in options?

In general though, if we can align on how format strings and value renderers work, that's a good start and files/versioning/etc. is sort of a tricky but solvable problem.

ValarDragon commented 3 years ago

I'm in huge support of this proposal! I think this is an amazing idea, and going to be a huge improvement for signing UX/auditability across the entire blockchain space.

I like mustache. If there is some concern around bugs w/ templating injections (which I don't have for mustache, but exists for more complex templating engines ala Jinja, Python fmt strings, C++ fmt strings, etc.), we can lint to block 'complex logic' in templates.

Pretty minor point around "does this encoding need to be bijective encoding", that I think changes ~nothing: The only reason I see for ensuring 1-1 decodability would be a tx malleability concern. However, avoiding tx malleability doesn't require decodability, but just requires collision resistance on the encoding.

aaronc commented 3 years ago

Pretty minor point around "does this encoding need to be bijective encoding", that I think changes ~nothing: The only reason I see for ensuring 1-1 decodability would be a tx malleability concern. However, avoiding tx malleability doesn't require decodability, but just requires collision resistance on the encoding.

Bijectivity may be a bit of a strong requirement, but if we had bijectivity it would ensure both non-malleablility and losslessness. I think having a lossless encoding is actually a pretty big assurance that no important information is either intentionally or non-intentionally obfuscated from the user. I would argue that anytime there lossiness there's a chance for malleability. Anyway, bijectivity while not a requirement in itself just seems like the easiest way to check for these things in tests.

aaronc commented 3 years ago

From today's architecture call, three variant proposals (based on https://github.com/cosmos/cosmos-sdk/issues/6513#issuecomment-888564228) came up:

everything in SignDoc gets encoded textually using templates
just Msgs get encoded textually, everything else gets encoded as JSON
everything gets encoded in JSON, but: a. we apply value renderers so that all values are pre-rendered strings or nested objects (ex. 1000 -> 1,000, 10000000uatom -> 10 atom), and b. msg type URLs get human readable names (ex. cosmos.bank.v1beta1.MsgSend -> bank send)

Possibly 2 or 3 might provide more of a balanced approach to Ledger so that there's some confidence in the structure of the outer packets, with free-form human readable content on inner values.

Obviously there are more variations that might be worth considering where we could provide 2 + 3 or full canonical JSON + 1 in the same SignDoc.

amaury1093 commented 2 years ago

For implementation, follow #11970

cosmos / cosmos-sdk