jedisct1 commented 4 years ago

Here is a proposal to unify all symmetric operations.

In traditional APIs, hashing, key derivation, encryption, authentication are clearly separated, because every algorithm was designed for one particular operation.

This doesn’t fit well with modern constructions. Recent hash functions are keyed, can absorb parameters such as salts and contexts, and can have a variable sized output, blurring the line between hash functions, MACs, PRFs, and KDFs.

Constructions based on large permutations allow a wide range of operations to be performed on a state updated incrementally, reducing all symmetric operations one core function and a shared state.

A common attribute between many recent ciphers and hash functions is the fact that they can exploit parallelism. Assuming that a hash function only requires an input message and a static set of parameters is not enough to take advantage of these new algorithms.

As a result, current APIs are either very specific (one set of functions per algorithm, not per kind of operation), or too generic, making it difficult to fit modern constructions in.

From a different perspective, any symmetric cryptography operation can be seen as a subset of a session-based authenticated encryption system. If we can define a minimal API for the later, we can use the same API for the former.

As a concrete proposal, a single “symmetric cryptography” object with the following methods seems to be enough to cover the needs of all symmetric operations.

open(key?, options?): return a new handle. The key is optional, as well as the options handle. options cannot be changed after the object has been instantiated. I couldn’t find any use case where one would need this, and allowing this can lead to vulnerabilities (ex: unintentionally reseting the nonce). Giving options as a parameter to open() is at odds with the signatures proposal, but that proposal can be updated accordingly.
absorb(data): absorb data, without encrypting it. This can be additional data for an AEAD, input data for hash functions and MACs, or info data for HKDF-expand. absorb(d1);absorb(d2)is equivalent toabsorb(d1 || d2)`.
absorb_next(): separate data to be absorbed. Useful for constructions allowing multiple inputs. For example, the ABSORB() function of the CYCLIST mode will be implemented as absorb(data) followed by absorb_next().
encrypt(data): encrypt data using the current state. (output buffer omitted for readability). As in ueno’s aead proposal, the output buffer is provided by the guest, and if the offset is the same as the input pointer, encryption will happen in-place. The IV/nonce can be explicitly provided, or will be automatically generated if it can be safely be done for the current algorithm.
decrypt(ciphertext, tag): decrypt the ciphertext and verify the tag. Decryption can happen in-place by providing the same offset for the input and output buffers.
squeeze(len): extract len bytes using the current state.
squeeze_tag(): extract an authentication tag. Unlike squeeze(), the length is defined by the chosen algorithm, and a Tag object is returned instead of raw bytes.
squeeze_key(): extract a key for the current state and ratchet (ex: Xoodyak’s SQUEEZEKEY() function), returns a key handle.
ratchet(): make it impossible to compute the state value before the call to that function.
get(property) and get_u64(property): get properties for the current operation. For example, this can be used to retrieve the current block counter, or a nonce that was automatically generated.
key_generate(): return a secret key for the current algorithm.

Options are provided as a handle. The relevant type has only 3 functions:

open()
set(property, value)
set_u64(property, value)

Usage examples

Hashing

options = options_open()
hash_op = symmetric_op_open("BLAKE3", key, options)
hash_op.absorb(data)
out = hash_op.squeeze(len)

MAC

options.set("context", "protocol")
options.set("salt", "abcd")
hash_op = symmetric_op_open("BLAKE2b-512", key, {})
hash_op.absorb(data)
tag = hash_op.squeeze_tag()
…
tag.verify(expected_tag)

Hashing multiple inputs

hash_op = symmetric_op_open("TupleHashXOF256", key, {})
hash_op.absorb(a)
hash_op.absorb_next()
hash_op.absorb(b)
tag = hash_op.squeeze(len)

Key derivation

Argon2 example:

options = options_open()
options.set_u64("memlimit", 1 * 1024 * 1024 * 1024)
options.set_u64("opslimit", 5)
options.set_u64("parallelism", 8)
kdf_op = symmetric_op_open("Argon2id", key, options)
kdf_op.absorb(data)
k1 = kdf_op.squeeze(len)
pwhash_str = kdf_op.squeeze_tag() // exportable as a standard hashed password string

HKDF-extract:

kdf_op = symmetric_op_open("HKDF-EXTRACT/SHA-512", key, {})
kdf_op.absorb(seed)
prk = kdf_op.squeeze_key()

HKDF-expand:

kdf_op = symmetric_op_open("HKDF-EXPAND/SHA-512", prk, {})
kdf_op.absorb(info)
k1 = kdf_op.squeeze(len)

AEAD

Encryption:

options = options_open()
options.set("context", "app context")
aead_op = symmetric_op_open("XCHACHA20_POLY1305", key, options)
nonce = aead_op.get("nonce")

aead_op.absorb(ad)
e = aead_op.encrypt(data)
tag = aead_op.squeeze_tag()

Decryption:

options = options_open()
options.set("context", "app context")
options.set("nonce", nonce)
aead_op = symmetric_op_open("XCHACHA20_POLY1305", key, options)
aead_op.absorb(ad)
p = aead_op.decrypt(data, tag)

Stateful Hash Object

options = options_open()
options.set("context", "Noise protocol")
sho_op = symmetric_op_open("SHO/SHAKE256", nil, options)
sho_op.absorb(a0)
sho_op.absorb(a1)
sho_op.absorb_next()
sho_op.absorb(b)
sho_op.encrypt(data)
sho_op.squeeze()
sho_op.ratchet()

Session-based encryption

options = options_open()
options.set("context", “Protocol”)
session_op = symmetric_op_open("Xoofff-SANE", k0, options)
session_op.encrypt(data)
session_op.absorb(a)
session_op.absorb_next()
session_op.absorb(b)
let tag1 = session_op.squeeze_tag()
let k1 = session_op.squeeze_key()
session_op.absorb(c)
let tag2 = session_op.squeeze_tag()

jedisct1 commented 4 years ago

The actual WebAssembly interface is straightforward to derive from this.

Other operations can easily be defined on top of this, for example a DRBG just needs open() and squeeze().

Some functions will return an error. For example ratchet() is not going to always be available. So, we need to define a comprehensive set of errors that each of these functions can return.

Other things that absolutely need to be specified:

The list of required, recommended and optional algorithms along with their identifiers and parameters.
How all edge cases are handled (what happens if the internal counter of a block cipher wraps?)

jedisct1 commented 4 years ago

We had a discussion about whether symmetric keys and authentication tags should just be represented as raw bytes, or as objects.

The proposed API includes three ways to squeeze data:

squeeze(len): this returns raw bytes, and the length is variable. Needed for XOFs, KDFs and DRBGs.
squeeze_tag(): returns a tag object. The length is not specified. It is implied by the algorithm. Returning an object makes it difficult for applications to truncate tags and compare them in an unsafe way.
squeeze_key(): returns a key object. That handle can be immediately used for other operations. It is also handled as a secret key, e.g. automatically wiped from memory on close().

With a specialized API for every operation, using types instead of raw bytes implied quite a lot of functions and types that had to be exported. The API proposed here solves this problem. It is minimal, yet encourage bindings to use types instead of raw bytes when it can improve security.

jedisct1 commented 4 years ago

Maybe we need better names than squeeze and absorb, as these are usually only associated with sponge functions.

Suggestions welcome!

ueno commented 4 years ago

We had a discussion about whether symmetric keys and authentication tags should just be represented as raw bytes, or as objects.

The proposed API includes three ways to squeeze data:

Well, I'm sorry, but I suggest that we should discuss a bit more about the objectives, before moving on to the specific solution. Having multiple ways to achieve the goal is certainly good, but I've already raised a point how we could encourage users to choose safer option on a merged PR.

Perhaps it's time to define a proposal process: for example, proposers should wait at least for a week to allow other people to comment before making the actual changes to the repo, and if someone makes a comment (not an approval) on the issue/PR, the proposer should take at least one iteration before merging. I've gone through the signature proposal and the implementation looks great, but I still have several disagreements at the API level.

jedisct1 commented 4 years ago

Absolutely, and this is a very important point. There are quite a lot decisions to be taken and tasks to be done, that affect the entire API.

Examples include error handling, high-level goals as well as types and functions reused in different parts of the API. We will also need to settle on the set of documents we have to produce in order to build a conformant implementation, as well as guidelines for bindings implementers.

In order to keep the discussions scoped, it may be better to open dedicated issues for each of these. Relevant proposals for API parts can then all be updated according to the outcome of these broader discussions.

An issue has been opened for the point you raised. Thanks again!

alterstep commented 4 years ago

I like it, very very elegant!

Suggestion: add

"absorb_fd"
"encrypt_fd"
"decrypt_fd"

-> direct read and write to files and network socket!

ueno commented 4 years ago

This looks like a really interesting proposal overall; thanks for starting this @jedisct1!

One thing I'm concerned is how to ensure stateful nature of crypto operations we are trying to enforce with type safety. For example, encrypt shouldn't be applied to the hkdf_op object, squeeze_key shouldn't be applied to aead_op, and squeeze_tag shouldn't be called before aead_op.encrypt. With this API, those errors can only be detected at run time, if I understand it right?

jedisct1 commented 4 years ago

Yes, the downside of trying to limit an explosion of the number of types and functions is that more checks will be done at runtime.

But that allows us to fully support what algorithms, not just kind of operations, can do. BLAKE3 used as a KDF supports ratcheting (we don't need different code for when used as a KDF and when use as a XOF). HKDF doesn't. And sponge-based constructions can share the same API as traditional constructions instead of having massive redundancy.

There is nothing wrong with calling squeeze_tag before encrypt, to compute a tag for the additional data.

jedisct1 commented 4 years ago

Mmmm... I don't think there is value in adding functions accepting file descriptors.

Unless everything is handled by the kernel itself, data flowing through these descriptors has to be stored somewhere anyway.

The guest allocates the encryption and decryption buffers used by the host. So this is not zero copy, but there are no additional copies required.

A host implementation accepting descriptors would do exactly the same thing as the guest would, just using host-allocated memory instead of guest-allocated memory.

Less context switches for sure, but context switches are fast in WebAsssembly.

Plus, every function accepting a file descriptor would have to deal with I/O errors as well.

What do you think @ueno ?

jedisct1 commented 4 years ago

Here, squeeze_tag() returns a Tag object, but making it return bytes as you suggested would still be fine, provided that the encoding of the tag is part in the algorithm name.

scrypt and scrypt-str would be different algorithm identifiers, although the function is exactly the same and only the encoding changes.

ueno commented 4 years ago

Mmmm... I don't think there is value in adding functions accepting file descriptors.

I agree: this needs more elaboration on practical use-cases. For secure transport, I guess it would be more useful to have support backed by standardized protocols (TLS, SSH), where those primitives are used as a building block.

WebAssembly / wasi-crypto

Symmetric operations #19

Usage examples

Hashing

MAC

Hashing multiple inputs

Key derivation

AEAD

Stateful Hash Object

Session-based encryption