ChainAgnostic / varsig

The cryptographic signature multifomat
Other
9 stars 4 forks source link

First Version #6

Closed expede closed 1 year ago

expede commented 1 year ago

📄 Preview

TODOs Post-PR

expede commented 1 year ago

This went through way more iterations than I expected going in. I should have known better: cryptography is super subtle!

oed commented 1 year ago

Let me try to throw a quick few examples on this:

SIWE

Let's start by assuming we have landed on an IPLD format (e.g. CACAO / ucan-ipld) for SIWE.

Signature params:

The eip4361-eip191 codec takes an IPLD object and tries to convert it to a SIWE string prefixed using eip191.

DagJWS

OK, so there is already infrastructure built to produce DagJWS compliant signatures. Here's how we could make them compatible with Varsig

Signature params:

Plain JWS

Maybe this is easier?

Signature params:

Note that the protected header is automatically generated. This means that we can't sign a JWT with this alg because it needs to include a "typ": "JWT" field in the header.

expede commented 1 year ago

we would need to register a new eip4361-eip191 codec [...] protected header is automatically generated.

I would actually like to do this; EIP-191 and EIP-712 feel like they belong in the table (to me at least).

You're right that we need another segment, though. There's "how was the data serialized" (e.g. DAG-CBOR), but also "did the signature method do something extra to the paylaod?" (e.g. JWT, EIP191, FIDO2 envelope). I was trying to avoid allowing nesting infinite layers, but this is an increasingly common pattern I think.

Lemme noodle on it a bit though.

oed commented 1 year ago

@expede to be clear, I think the current approach is quite good. I don't think we should add new fields here for the reason described above, if it's not needed.

Let's think about if we are defining a new signature suite:

dag-cbor ed25519 signature

Signature params:

Really like the simplicity of this!

expede commented 1 year ago

@oed I made some changes. Here's updated versions of the tables (largely for my own benefit to make sure it still ticks all the boxes)

SIWE

Let's start by assuming we have landed on an IPLD format (e.g. CACAO / ucan-ipld) for SIWE.

Signature params:

* `contentEnc` - we would need to register a new `eip4361-eip191` codec

Yep, I think we should propose it. It's certainly not DAG-CBOR 😛

The eip4361-eip191 codec takes an IPLD object and tries to convert it to a SIWE string prefixed using eip191.

service.invalid wants you to sign in with your Ethereum account:
0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2

I accept the ServiceOrg Terms of Service: https://service.invalid/tos

URI: https://service.invalid/login
Version: 1
Chain ID: 1
Nonce: 32891756
Issued At: 2021-09-30T16:25:24Z
Resources:
- ipfs://bafybeiemxf5abjwjbikoz4mc3a3dla6ual3jsgpdr4cjr3oz3evfyavhwq/
- https://example.com/my-web2-claim.json

It looks a lot like RFC822, which Bacalhau is also exploring using 🤔 I wonder if we could use a generic serializer for this kind of data, regardless of the fields. Could the contained data be represented in IPLD as the following DAG-JSON, which would then have DAG-RFC822:

[
  "service.invalid wants you to sign in with your Ethereum account:\n0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2\n\nI accept the ServiceOrg Terms of Service: https://service.invalid/tos",
  {
    "URI": "https://service.invalid/login",
    "Version": 1
    "Chain ID": 1
    "Nonce": 32891756
    "Issued At": "2021-09-30T16:25:24Z"
    "Resources": [
      "ipfs://bafybeiemxf5abjwjbikoz4mc3a3dla6ual3jsgpdr4cjr3oz3evfyavhwq/",
      "https://example.com/my-web2-claim.json"
    ]
  }
]

DagJWS

OK, so there is already infrastructure built to produce DagJWS compliant signatures. Here's how we could make them compatible with Varsig

Signature params:

New codec, dag-cbor-sha2-256-dag-jose

Hmm I see. This is the whole nested presentation thing 🤔 Is it possible to "just" represent this as a DAG-JWS-CBOR and get the sha256-256 bit from the multihash?

  * Encode a JWS that uses the CID as payload

Arguably your payload here is the CID, so another interpretation could be:

Plain JWS

Signature params:

  * This codec takes the IPLD object
  * Encodes it as dag-json converts it to `<protected-header>.<payload>`

This is doable I think, though yes, it requires a new multicodec. Could it take any array, base64url it, and intersperse .s?

[
  {"a": 1, "b": 2},
  {"c": 3, "d": 4},
  {"e": 5, "f": 6},
  {"g": 7, "h": 8}
]

// Not actually their base64, just pseudocode
"7dsahk-eq_9.3-d2jdsajkdd.9201jd_iwqas.210iu_-wji5ask"

Note that the protected header is automatically generated. This means that we can't sign a JWT with this alg because it needs to include a "typ": "JWT" field in the header.

Yeah, per @Gozala's earlier comment, JWTs may require using 0xd000 Nonstandard, which says "the application knows enough semantically how to do the rest".

It's a bit confusing where exactly the line is, because we can deterministically encode a JWT, but also we're not going to put every possible standard header in here 🤔

expede commented 1 year ago

This is doable I think, though yes, it requires a new multicodec. Could it take any array, base64url it, and intersperse .s?

Actually, this could possible be doable 🤔 You'd have to wrap the payload in an array (or something), but given that it needs to follow a schema (JWT), this may be acceptable to leave concerns of schema out of the encoding layer.

Gozala commented 1 year ago

It's annoying but github tends to bury some comments here is some

https://github.com/ChainAgnostic/varsig/pull/6#discussion_r1038804929 https://github.com/ChainAgnostic/varsig/pull/6#discussion_r1038803227

oed commented 1 year ago

It looks a lot like RFC822, which Bacalhau is also exploring using 🤔 I wonder if we could use a generic serializer for this kind of data, regardless of the fields. Could the contained data be represented in IPLD as the following DAG-JSON, which would then have DAG-RFC822:

But this wouldn't be compatible with the CACAO / ucan-ipld schema 🤔

Yeah, per @Gozala's earlier comment, JWTs may require using 0xd000 Nonstandard, which says "the application knows enough semantically how to do the rest".

I made a comment above (https://github.com/ChainAgnostic/varsig/pull/6#discussion_r1039321853) for a better way of encoding all of the JWS standard. (btw, I'm starting to think the current way dag-jose works is a lost cause).

Gozala commented 1 year ago

I seem to have caused some confusion with my comments, so I’d like to clarify some of my points here without fearing that they will get buried under some line comment

  1. I agree that varsig need to describe how to go from data model to bytes which will be then signed.
    • That is what “Content Multicodec Prefix” supposed to describe.
    • In https://github.com/ucan-wg/ucan-ipld#25-signature it was implied that data was formatted into JWT first and then encoded in UTF8.
      • I think it makes sense to signal that explicitly here, because we’ll have more than JWT formatting
  2. Getting from data model to bytes may involve multiple transformations. In some cases it may make sense to have a code to describe the whole pipeline and in some it would make more sense to decompose.
    • That is why I suggested that this segment be a multiformat not a varint, precisely so it could have bunch of tags or just one.
  3. There are some well known signature types like RS256 which imply combination of tags in varsig. However there will be combinations of tags that don’t have standard name. That is why Nonstandard Signatures had a last segment allowing it to embed name it wanted to go by. This would allow rendering name for such signatures without having to register name and getting everyone to recognize it.
    • It is not meant to replace set of tags describing signature
    • It is mostly to allow translating data from one format to the other without doing any kind of signature verification (e.g. I could base58btc encode just digest from the varsig and describe it via embedded label)
    • labels are not necessary for standard signatures because their tag combination will be well known
  4. I’m ok with umbrella varsig code, but there are some tradeoffs and I want to make sure they are considered. It also might be worth inquiring why multihash did not go similar route.
oed commented 1 year ago

That is why I suggested that this segment be a multiformat not a varint, precisely so it could have bunch of tags or just one.

What does it mean for a segment to be a multiformat? How would you encode multiple codecs as a multiformat?

Gozala commented 1 year ago

What does it mean for a segment to be a multiformat? How would you encode multiple codecs as a multiformat?

it means you can nest them, e.g you could emebed multidid, because first varint will tell you how to read following data, but more relevant here would be some serialization formats could define their own multiformat for encoding relevant parameters without having to update varsig itself

expede commented 1 year ago

@Gozala I feel like a ~2 concrete examples of the version of this in your head would be really helpful for me. Let me make a first attempt at the kind of thing that I'm interpreting (but not 100% sure).

General Format

<varint varsig><varint content_multicodec><varint multihash><varint key_multicodec><varint raw_signature>

UCAN Exmaple

<varsig-ucan-ipld><dag-cbor><rsa-pkcs-1><sig_bytes>

SIWE Example

<varsig-siwe><0x55 raw_bytes><secp256k1><sig_bytes>

Previous Format

You had mentioned earlier that you liked the previous version. I'm just stashing that here to make it easy to find.

oed commented 1 year ago

A few examples would be helpful for me as well! Not super intuitive how this would work if every step is an ipld codec.

Gozala commented 1 year ago

Sorry it took long to respond to this thread, had busy end of the year and this fall off the radar.

I'm assuming that

I feel like a ~2 concrete examples of the version of this in your head would be really helpful for me.

Was referring to this https://github.com/ChainAgnostic/varsig/pull/6#issuecomment-1338491847 Let me start I was mostly relaying feedback from @mikeal who may have actual examples in mind. While I don't have a concrete example, but I'll try to a clarify never the less. E.g. [DKIM][https://www.rfc-editor.org/rfc/rfc6376.html#section-3.4] needs to communicate:

  1. How to canonicalize email headers (if at all)
  2. Which headers and in which order are signed (see Signed header Page 21 of the spec)

If you were to try and represent all of the in varsig just DKIM varint won't cut it. However you could have DKIM multiformat something along the lines of

<varint dkim_code><varint dkim_canonicalization><varint params_length><bytes ordered_header>

All of that can be nested into varsig inside content_multiformat and dkim multiformat itself could be own spec

You can imagine something along those lines been useful in the context of UCANs as well e.g. to signal if JWT was canonicalized or not before signing.

You can think of those as configuration of the IPLD Codec so it knows how to go about it. It is worth pointing out that most folks from core IPLD are likely going to suggest putting those details into a block and sign that instead. Which I also agree with, however it's would make it impossible to represent existing signatures in the wild in varsig, which is why I think it is a reasonable compromise.

Does this help ?

expede commented 1 year ago

Multiformat

E.g. [DKIM][https://www.rfc-editor.org/rfc/rfc6376.html#section-3.4] needs to communicate

Yeah this totally makes sense 💯 I like that it's flexible in letting us define how each type is serialized. There is overhead to describing noncanonicalized orderings, at which point we probably want to recommend that they be encoded as raw bytes.

Taking a step back, they type of what we're doing is something like this:

serialize :: IPLD -> MultiformatConfig -> RawBytes
verify :: RawBytes -> PublicKey -> HashAlgo -> Bool

What I'm (temporarily) calling "multiformat config" above is all the stuff you'd need to describe the serialization — it's taken from your DKIM example. The downside of this approach is that you have to understand the multiformat in order to serialize it: not just anyone can validate the signature.

[Aside: something like Autocodec may alleviate this, as you could just pass a reference to some deterministic Wasm, but that's a whole other project.]

Thinking this through out loud: in the UCAN example, a noncanonical UCAN would probably have to get moved around as a string (or bytes), but also it won't be able to take full advantage of Varsig since it's not in IPLD. UCAN-IPLD would be able to get some nice properties with Varsig, because we could have a compact multiformat to describe that the rest of the IPLD in the payload gets serialized as a canonical JWT before signing (which is relatively trivial).

Varsig Format

Does this help ?

The context is helpful, yes! Joel and I were asking for some concrete examples for the Varsig format itself, but let me take a crack at your above example, and you can tell me where it's wrong:

Option A: Varsig Multiformat Prefix

<!-- General -->
<multicodec varsig_tag><multiformat content_config><varint hash_multicodec><varint key_multicodec><varint raw_signature>

<!-- Nested -->
<varsig_tag><dkim_code><dkim_canonicalization><params_length><ordered_header><hash_multicodec><key_multicodec><raw_signature>

Option B: Signature Type Directly

<!-- General -->
<multicodec signature_type><varint hash_multicodec><multiformat content_config><varint raw_signature>

<!-- Nested -->
<signature_type><hash_multicodec><dkim_code><dkim_canonicalization><params_length><ordered_header><raw_signature>
expede commented 1 year ago

Oh I should also tag you @Gozala ☝️

expede commented 1 year ago

@Gozala updated to reflect our conversation!

  1. Removed the content encoding field
  2. Added some exmaples in section 4

Once you give PR approval, I'll merge :)

(Perhaps worth noting that it would be nice in UCAN Invocation to know which encoding format was used in the Authorization array, because it's not a single string, but we can maybe work around that by defining a multiple-payload signature varsig?)

bumblefudge commented 1 year ago

Should I override this pending request? image

expede commented 1 year ago

@bumblefudge evidently I also have the power 💪

We had chatted with Irakli about the final changes in some length, so I'm sure it's fine (in order to keep stuff moving forward). If not, we can cut a 0.2 😉