Clarification on scope - Githubissues

Hey Tony,

As discussed on Twitter, I am interested in using cryptouri for a cryptographic library. I am not yet entirely sure if cryptouri fits what I expect from it, so here is what I would want to do with it. Please tell me if that is possible and what you think about using cryptouri for this.

So, this library I'm building is kind of a framework library where you can plug together the algorithms you want to use and it combines them correctly. Before doing any operations, the library checks that all needed components are available in order to fulfull C-I-A. It can do single operations on data or be used for a communication channel (w/ double ratchet and key rotation). The library is almost finished in its first version will be professionally audited in October or November.

Currently, all these plugins have a unique name, which I am seeking to replace with a cryptouri equivalent:

Symmetric Algorithms:

SALSA20
XSALSA20
AES256-CTR

Assymmetric Algorithms:

ECDH-X25519
ECDH-P224
ECDH-P256
ECDH-P384
ECDH-P521

MACs:

POLY1305
HMAC (requires a hash alg as parameter)

Combined Algorithms:

CHACHA20POLY1305
AES256-GCM (soon)

Other:

HKDF (requires a hash alg as parameter)
PBKDF2 (requires a hash alg as parameter)

Hash Algorithms:

SHA2-224
SHA2-256
SHA2-384
SHA2-512
SHA2-512-224
SHA2-512-256
SHA3-224
SHA3-256
SHA3-384
SHA3-512
BLAKE2s-256
BLAKE2b-256
BLAKE2b-384
BLAKE2b-512

Secondly, these data types would be great to have a standardized cryptouri format for:

Public and private keys
Secret Key
Digest
Signature (what kind of data would be included?)

Also, on a side note, it would also be interesting to use these formats in messages and headers, in an albeit more compressed format and without integrity checks.

Much of that should fit both into what I've already thinking and algorithms I immediately want to support. You can take a look at the Rust implementation:

Here's what it presently includes:

Public keys: (crypto:public:key or crypto-public-key)
Secret keys: (crypto:secret:key or crypto-secret-key)
Digests: (crypto:public:digest or crypto-public-digest)
Signatures: (crypto:public:signature or crypto-public-signature)

(Sidebar: perhaps public could be shortened to pub and secret to sec, and public dropped from digests/signatures. Signature could probably be abbreviated to sig, and digest to hash)

There are four algorithms presently supported:

Encryption

AES-128-GCM

crypto:secret:key:aes128gcm
crypto-secret-key-aes128gcm

AES-256-GCM

crypto:secret:key:aes256gcm
crypto-secret-key-aes256gcm

Signing

Ed25519

Secret Key

crypto:secret:key:ed25519
crypto-secret-key-ed25519

Public Key

crypto:public:key:ed25519
crypto-public-key-ed25519

Signature

crypto:public:signature:ed25519
crypto-public-signature-ed25519

Digest

SHA-256

crypto:public:digest:sha256
crypto-public-digest-sha256

Will post a second comment with some point-by-point followup on specific algorithms.

To break down what you've requested:

(Unauthenticated) Stream Ciphers

SALSA20, XSALSA20, AES256-CTR

I have been mulling whether including an intended algorithm usage for a key is a good idea, and generally been leaning towards yes (see above for aes128gcm and aes256gcm). That is to say: Salsa20 and XSalsa20 take the same key type, and generally all AES-based algorithms take the same key type, but I see value in encoding the intended usage a key was generated for into the CryptoURI, and erroring on mismatch.

Prospectively these would be e.g.:

crypto:secret:key:salsa20
crypto:secret:key:xsalsa20
crypto:secret:key:aes256ctr

(Sidebar: split symmetric encryption into a namespace? e.g. crypto:secret:key:enc:salsa20)

(Elliptic Curve) Diffie-Hellman / Key Exchange / Key Agreement

ECDH-X25519 ECDH-P224 ECDH-P256 ECDH-P384 ECDH-P521

My first thought is: static or ephemeral? For CryptoURI, I'm not sure anything but static keys make sense, and if we only have to worry about one, we don't need to encode the distinction into the key format itself. So something like:

crypto:secret:key:x25519
crypto:secret:key:ecdhp244
crypto:secret:key:ecdhp256
crypto:secret:key:ecdhp384
crypto:secret:key:ecdhp521

(Sidebar: should D-H be split into a namespace? e.g. crypto:secret:key:dh:x25519)

Message Authentication Codes

POLY1305

Isn't a MAC! 😉 It's a universal hash function and therefore useful as a one-time authenticator, however MACs generally operate over multiple messages. Poly1305-AES provides this property. Perhaps:

crypto:secret:key:poly1305aes
crypto:secret:key:poly1305+aes128
HMAC (requires a hash alg as parameter)

...is a more interesting one, and should use the digest/hash algorithm registry for identifiers. Some prospective syntax:

crypto:secret:key:hmac+sha256
crypto:secret:key:hmac-sha256
crypto:secret:key:hmac.sha256
crypto:secret:key:hmac:sha256

AEAD modes

CHACHA20POLY1305

Would suggest:

crypto:secret:key:chacha20poly1305

AES256-GCM (soon)

Already spec'd

HKDF (requires a hash alg as parameter)

This is a particularly fun one in terms of combining it with other constructions. I think an interesting one to look at is how we might represent keys for the Google Tink instantiation of an AES-GCM STREAM mode, which uses HKDF to derive a per-stream key.

Some ideas for the Tink STREAM case:

crypto:secret:key:aes256gcm+hmac-sha256+stream
crypto:secret:key:aes256gcm+hmac.sha256+stream

Password Hashing Functions

PBKDF2 (requires a hash alg as parameter)

These are particularly fun in terms of both how complex they are and how URI syntax might permit something a bit friendlier than the extended crypt (e.g. $n$...) format used for storing them.

The first question is what do you want to represent as a CryptoURI in regard to them? I think the main use case is a stored password digest, which would include a salt and the algorithm-specific parameters used to compute it.

Here's an example of a prospective PBKDF2 digest:

crypto:secret:pwhash:pbkdf2+sha256:a1b2c3...?i=10000&s=d4e5f6...

Digest Algorithms

These are straightforward enough. As mentioned earlier, this one is already implemented:

SHA2-256

I think it might be worth making GitHub issues relevant to each of these topics so we can discuss each of them in more depth.

Yes, let's split up topics into issues, but before we get going, I want to finish clarifying the basics:

Essentially, it boils down to: What are cryptouri's, what do they represent exactly?

As I understand, there is typed cryptographic material, which can be a public or private key, a secret, a signature or a hash, where you have: crypto:<type>:<subtype>:<data>, eg:

crypto:hash:sha256:beef...
crypto:public:key:ed25519:beef...
crypto:private:key:ed25519:beef...
crypto:secret:key:beef... for symmetric algs?

But then I am confused when you start with something like this, as if your trying to put a key and how to use it into a single cryptouri: crypto:secret:key:aes128gcm?

And then, go one step further, adding a protocol to cryptouri: crypto:secret:key:aes256gcm+hmac.sha256+stream?

(Sidebar: split symmetric encryption into a namespace? e.g. crypto:secret:key:enc:salsa20)

I guess this would resolve some of my confusion, though crypto:secret:key:enc is a lot of words for something so basic.

To repeat my question: What are cryptouri's, what do they represent exactly? Also: Which purpose do cryptouri's serve?

What is the "USP", so to speak?
Is the main goal to have high readability?
Do you expect people to type these manually a lot? (Bech32!)
Is it more important to make them shorter and faster to use?
Will there be a string and a binary representation to fit both scenarios?

(sorry for all the questions 😉)

Maybe, to put it another way: I am missing a list of "Why cryptouri is so awesome and everyone should use it". I mean, I can see the benefit of this, because I started to define algorithm names and data formats and stuff and it was hard, so joining efforts with others will give me a better result at creating something maintainable and non-confusing.

Maybe a good place to help me understand, is to think about usage scenarios. Are there any available already?

Quick new ideas for namespaces:

Cryptographic data:
- crypto:secret:<data>: Key material for symmetric ciphers
- crypto:public:<algorithm>:<data>: Public key
- crypto:private:<algorithm>:<data>: Private key
- crypto:signature:<algorithm>:<metadata>:<hash>: Signature
- crypto:hash:<algorithm>:<data>: Hashsum
- crypto:password:<data>: Cleartext password
- crypto:protected-password:<data>: Hashed password
- crypto:totp:<data>: TOTP Token
Ciphers
- crypto:cipher:<algorithm>: Symmetric cipher
- crypto:signer:<algorithm>: Signature algorithm
- crypto:exchange:<algorithm>: Key exchange algorithm
- crypto:hasher:<algorithm>: Hash function
- crypto:pwhasher:<algorithm>: Password hash function

Just a braindump, take what you like, leave the rest.

Last thing for this comment:

POLY1305 Isn't a MAC! 😉

I am by no means a real expert in this area, so I am extremely cautious how I use and combine cryptographic algorithms.

I use the Go extended stdlib here: https://godoc.org/golang.org/x/crypto/poly1305 Which states: "Package poly1305 implements Poly1305 one-time message authentication code as specified in https://cr.yp.to/mac/poly1305-20050329.pdf." And also: "Poly1305 was originally coupled with AES in order to make Poly1305-AES. AES was used with a fixed key in order to generate one-time keys from an nonce. However, in this package AES isn't used and the one-time key is specified directly." [sic]

DJB writes in his paper: "There is nothing special about AES here. One can replace AES with an arbitrary keyed function from an arbitrary set of nonces to 16-byte strings. This paper focuses on AES for concreteness."

How I use the poly1305 package: Every operation has its own unique nonce in addition to the main key material. They are both fed into a key derivation function which generates a key for every gear in the process of doing whatever cryptographic operation. Currently this would mean that I use something like Poly1305-HKDF, but I still treat Poly1305 itself as a MAC function, and I think this is correct.

I now carefully re-read your statement after writing all that. I probably wouldn't have, if I had read it that carefully the first time. So in your eyes, I'd be using Poly1305-HKDF, so we basically agree afterall? 🤓

Essentially, it boils down to: What are cryptouri's, what do they represent exactly?

A namespace of cryptographic objects, formatted to be friendly to human interaction/consumption/transcription. These objects are all effectively numbers (typically very large numbers) or sets thereof. I'd personally consider them URN-like, in that they are a self-contained, location-independent representation of a unique identifier within a particular namespace.

Some prior art and inspiration can be found in RFC 6920: Naming Things with Hashes, which introduced a ni:/// URI for content addressable data:

 ni:///sha-256;UyaQV-Ev4rdLoHyJJWCi11OHfrYv9E1aGQAlMO2X_-Q

CryptoURI is an attempt to generalize this notion to commonly used cryptographic objects in a way that both conveys their sensitivity in a standard manner (crypto:sec(ret) means secret!) while also encoding algorithm identifiers.

The primary immediate focus is cryptographic key types, notably secret keys. In that regard, CryptoURI can be seen as an alternative to formats like PKCS#8, JOSE JWK, or numerous bespoke secret key encodings (e.g. SSH). Ideally it can provide a feature set equivalent to PKCS#8, including this like encrypted key storage and even password-based encrypted key storage using modern algorithms.

For digital signatures, it can be seen as an alternative to Cryptographic Message Syntax (CMS), various other ASN.1 DER-based encodings, or JOSE JWS.

RFC 6920 was interesting in its usage of the hierarchical features of URIs. Though I don't think I have any examples of this anywhere, a longer-term goal of CryptoURI is to support hierarchies of cryptographic objects, e.g. derivation paths. These are still URN-like in that they're location-independent: any bearer of a CryptoURI with hierarchical features can modify a derivation path, hand it to a conforming implementation, and get another CryptoURI for the derived object.

An example of such an algorithm is "HKD32", an extraction/simplification of BIP32 (which can be used as the core of a conforming BIP32 implementation)". Given that, you can imagine representing a key derivation hierarchy for AES keys as follows.

Non-hierarchical:

crypto:secret:key:aes128gcm:a1b2c3...

Hierarchical:

crypto:secret:key:aes128gcm+hkd32://a1b2c3.../<usage>/<epoch>

or in practice something like

crypto:secret:key:aes128gcm+hkd32://a1b2c3.../params/42

Hierarchical usage patterns are the main motivation for having a URI-like syntax in addition to the "dasherized" syntax: while the latter may be more convenient in a number of contexts, its more simplistic notation will also necessitate a more constrained set of usages, and expressing complicated things like derivation hierarchies seems like it will be difficult to keep unambiguous.

But then I am confused when you start with something like this, as if your trying to put a key and how to use it into a single cryptouri: crypto:secret:key:aes128gcm?

That concern is literally what I opened with 😉

"I have been mulling whether including an intended algorithm usage for a key is a good idea, and generally been leaning towards yes (see above for aes128gcm and aes256gcm). That is to say: Salsa20 and XSalsa20 take the same key type, and generally all AES-based algorithms take the same key type, but I see value in encoding the intended usage a key was generated for into the CryptoURI, and erroring on mismatch."

So there is a debatable issue here: how much algorithm information should be encoded into each CryptoURI? Salsa20 and XSalsa20 use identically sized keys, so why should a CryptoURI encode salsa20 vs xsalsa20?

This gets back to the human interaction/consumption/transcription issue. By encoding more information about how keys should be used, it makes it easier for e.g. security teams making keys to specify intended usages in a way which software implementations can verify.

A higher degree of specificity also enables things like a 1:1 mapping of CryptoURI algorithm identifiers to their IANA-assigned OIDs, for example.

What is the "USP", so to speak?

It depends on if you want a short term goal or pie-in-the-sky.

Pie-in-the-sky:

a common, unobtrusive, human-friendly format for all cryptographic objects.

Short-term goals:

a modern, human-friendly format for public and secret keys (with emphasis on the latter) for a variety of algorithms (PKCS#8/JWK "killer")
support for encryption/key-wrapping with modern algorithms (including password-based encryption)
humane user experience
- make it as simple as possible for anyone to be able to tell, with minimal instruction, if what they're looking at is a sensitive cryptographic object (e.g. secret key)
- detect transcription errors
- compact, minimal, fully textual: commonly used objects fit on a single line with no whitespace

Is the main goal to have high readability?

Absolutely. But perhaps beyond readability, I think "comprehensibility" is important. As stated above, the format should make it easy for anyone, even people who don't know any of the algorithms they're looking at, to be able to understand they are looking at something which is particularly sensitive/secret (i.e. crypto::sec(ret)).

Do you expect people to type these manually a lot? (Bech32!)

As I sit here with manually transcribed secrets in front of me, yes it's something I very much want to support. For context to any third party readers out there: the current implementation of CryptoURI uses Bech32, an ASCII encoding for binary data (ala hex/base32/base64) but with an alphabet engineered to reduce transcription errors, and also an associated checksum (ala a Luhn check) to detect transcription errors. This adds a small amount of overhead to the encoded data, which does come at a cost.

The benefit is general resilience to transcription errors over lossy media (or at least, the ability to detect them), be that manually transcribing them between devices by typing, reading them out over the phone, or writing them down.

Is it more important to make them shorter and faster to use?

I'd say succinctness is a secondary goal to human-friendliness, in that shorter strings are easier for humans to work with.

Will there be a string and a binary representation to fit both scenarios?

CryptoURI is very much intended to be a text format. I am a big fan of binary formats too, but I think the solution there is to have precise and well-understood mappings to existing binary formats. To that end an algorithm registry should maintain a set of 1:1 aliases to existing algorithm identifiers, to facilitate an uncomplicated mapping of e.g. crypto:sec:key -> PKCS#8.

Done correctly, that should just be looking an OID up in the CryptoURI algorithm identifier registry, and serializing the decoded Bech32 binary object.

re: the rest of your post, I think I can open a follow-up issue which can serve as a starting point.

As a hierarchical namespace, I think we can break this problem down hierarchically so we're not talking about a dozen things at the same time:

https://github.com/cryptouri/cryptouri-spec/issues/2

That said, I think it's still helpful to have a thread like this one to talk about the bigger picture.

Thanks for the thorough answers (and repeats), I think understand what CryptoURI is trying to achieve much better now - and like it much more.

Also, I am starting to realize how big this is going to be and will move to gradually supporting CryptoURI instead of fully switching to it now. I think this approach will allow us to take time to carefully define the details, instead of hurrying through the process. This does not reduce my ambition to contribute in any way.

I will continue with issue #2, and further deepen my understanding until I start opening other issues.

cryptouri / cryptouri-spec

Clarification on scope #1