Closed Gozala closed 1 year ago
Pulling @expede and @oed into this
<varint sig_alg_code><varint payload_encoding><vairint sig_size><bytes sig_output>
Agreed! Let's do it 💪
Okay, so I recognize that I've been a champion for the above previously, but I'm going to be annoying and give the devil's advocate view:
RS256 is RSA + SAH256. ECDSA is usually SHA256, but doesn't have to be. We could separate these out into separate fields...
<varint sig_alg_code><varint sig_hash><varint payload_encoding><vairint sig_size><bytes sig_output>
^^^^^^^^^^^^^^^^^
...which is yet one more field / a byte or two extra.
I started writing text here, but convinced myself that including the multicodec of the payload does make sense here. If you're signing e.g. a non-canonicalized JWT, you just signal it as 0x00 raw bytes
in the signature.
Should it include the hash function?
Do all signature algorithms have hashing functions ? If some don’t then question would arise what to do with those. Perhaps “identity” code would do the trick there.
I’m warming up to this idea, in fact we could simply reuse multihash and have format like
<varint sig_alg><varint payload_encoding><multihash>
If you're signing e.g. a non-canonicalized JWT, you just signal it as 0x00 raw bytes in the signature.
I would argue that we need a JWT multicodec code for that, because raw usually implies something else.
P.S.: 0x00 is identity multihash code, 0x55 is raw binary code
I’m warming up to this idea, in fact we could simply reuse multihash and have format like
This is a bit backwards I think. The hashing function is what is used over the canonicalized payload. The signature itself is not a hash, so I don't think we can use multihash here.
I started writing text here, but convinced myself that including the multicodec of the payload does make sense here. If you're signing e.g. a non-canonicalized JWT, you just signal it as 0x00 raw bytes in the signature.
I assume we need a canonicalization alg that describes how you take the payload and encode it as a JWT? If you just have the bytes of a raw JWT string that also needs to be signaled somehow? I guess it depends on how the data structure looks like where you get the JWT string and the signature?
btw, I'd prefer if we call it payload_canonicalization
rather than payload_encoding
.
I assume we need a canonicalization alg that describes how you take the payload and encode it as a JWT? If you just have the bytes of a raw JWT string that also needs to be signaled somehow? I guess it depends on how the data structure looks like where you get the JWT string and the signature?
I mean this is from data model (of certain schema) to bytes. Which is why I call it encoding, it is a same code as in cid of the data.
btw, I'd prefer if we call it payload_canonicalization rather than payload_encoding.
but I want to use e.g. dag-cbor or dag-json depending on how you’ve encoded model to bytes before signing. Perhaps you’re saying canonicalization is yet another param ?
@mikeal suggested that instead of <varint payload_encoding>
we use <multiformat payload_encoding>
instead. In common cases it could be just single varint
but it also provides a way to include other canonicalization details in specific instances.
I mean this is from data model (of certain schema) to bytes. Which is why I call it encoding, it is a same code as in cid of the data.
No this is not at all what I mean. Why would you need to include which IPLD encoding you are using? I assume you get this from the CID when you load and interpret the IPLD block?
We need a varint that represents how to go from ipld object
-> serialized data to sign
For example:
ipld object
-> JWT protected header + payload
ipld object
-> SIWE message
Basically we need to know how to go from ipld data to the bytestring used to verify the signature.
but I want to use e.g. dag-cbor or dag-json depending on how you’ve encoded model to bytes before signing. Perhaps you’re saying canonicalization is yet another param ?
I don't see why this wouln't just be part of the canonicalization alg?
@oed we mean same thing just use different terms. IPLD codec literally takes data and turns it into bytes
@Gozala but we don't sign over IPLD encoded data. We sign over JWT data or SIWE messages.
You can think of both as IPLD encoders and this came up in other context, where it literally is either dag-cbor or dag-json.
p.s.: I don’t care what we call it
@Gozala I don't really follow. In the case of SIWE we have a bunch of data in various fields of the IPLD object. These are the steps I'm thinking about:
CID
-> bytes
from blockstore or networkbytes
-> IPLD object
using the IPLD codecIPLD object
-> SIWE message
and signature
(this step is what I call canonicalization)signature
is correct over SIWE message
bytesThe other way around:
SIWE message
and sign it (signature
)SIWE message
and signature
-> IPLD object
(canonicalization)IPLD object
-> bytes
(IPLD codec)byes
) -> CID
CID
->bytes
from blockstore or networkbytes
->IPLD object
using the IPLD codecIPLD object
->SIWE message
andsignature
(this step is what I call canonicalization)- Verify that
signature
is correct overSIWE message
bytes
These are definition of the IPLD encoder / decoder :
export interface BlockEncoder<Code extends number, T> {
name: string
code: Code
encode: (data: T) => ByteView<T>
}
/**
* IPLD decoder part of the codec.
*/
export interface BlockDecoder<Code extends number, T> {
code: Code
decode: (bytes: ByteView<T>) => T
}
So your steps are
Codec.decode(bytes)
SIWECodec.encode(bytes)
PubKey.verify(SIWECodec.encode(bytes))
You are serializing some data into bytes in some format, which is what IPLD encoder is.
Ok I see what you are saying now @Gozala. Thanks for clarifying!
Interestingly your example above is super clear for a JWT where the signature is part of the encoded message. For SIWE this is not the case. We will have the signed string separately from the signature bytes. There is no official way to encode these two together.
we could do something like this though:
decoded = Codec.decode(bytes)
siweStr = SIWECodec.encode(decoded)
PubKey.verify(siweStr, decoded.signature)
@oed oh yea sorry I forgot to add actual signature into verify, because they're separate in JWT cases as well
@Gozala In JWTs they are not separate?
A JWT should be a string like this:
<base64url-protected-header>.<base64url-payload>.<base64url-signature>
@Gozala In JWTs they are not separate? A JWT should be a string like this:
<base64url-protected-header>.<base64url-payload>.<base64url-signature>
I mean it is, but you still pass first two segments as a payload and third as signature.
Pretty sure it differs per implementation. Most that I've seen you just pass the entire JWT string.
True for both of these:
Trying to figure out how to represent a DagJOSE (JWS) as a varsig.
We have,
<varint sig_alg_code><varint payload_encoding><vairint sig_size><bytes sig_output>
Naive approach would be:
sig_alg_code
: 0xd0ed
payload_encoding
: 0x85
However, this doesn't really cut it since dag-jose only says how to go from a JWS-string to dag-jose bytes, not some arbitrary structure to bytes.
So it seems like we will need to register a new payload_encoding
for every possible payload we have?
For example we would need to define:
This means that we also need a specific codec for invocations as well?
Maybe I'm missing something here?
Originally this came up here https://github.com/ucan-wg/ucan-cacao/issues/2#issuecomment-1324795275, but it's probably best to continue discussion here. Here is a short summary:
Current version of varsig is specified as follows
However since CACAO signs CBOR payload as opposed to JWT payload somehow we need to communicate what is the encoding of the payload in the signature.
One option was to simply expand list of
sig_alg_code
s to accommodate more payload formats. However it would imply allocating signature codes for each signature algorithm per each encoding. It also implies that if I have created a new codec not only I have to get a new multiformat code for the IPLD encoding, I also need to get set of codes for signature algorithms which is not great.For this reason I think we should change format to following instead