WebAssembly / wasi-crypto

WASI Cryptography API Proposal
162 stars 25 forks source link

Symmetric operations #19

Closed jedisct1 closed 4 years ago

jedisct1 commented 4 years ago

Here is a proposal to unify all symmetric operations.

In traditional APIs, hashing, key derivation, encryption, authentication are clearly separated, because every algorithm was designed for one particular operation.

This doesn’t fit well with modern constructions. Recent hash functions are keyed, can absorb parameters such as salts and contexts, and can have a variable sized output, blurring the line between hash functions, MACs, PRFs, and KDFs.

Constructions based on large permutations allow a wide range of operations to be performed on a state updated incrementally, reducing all symmetric operations one core function and a shared state.

A common attribute between many recent ciphers and hash functions is the fact that they can exploit parallelism. Assuming that a hash function only requires an input message and a static set of parameters is not enough to take advantage of these new algorithms.

As a result, current APIs are either very specific (one set of functions per algorithm, not per kind of operation), or too generic, making it difficult to fit modern constructions in.

From a different perspective, any symmetric cryptography operation can be seen as a subset of a session-based authenticated encryption system. If we can define a minimal API for the later, we can use the same API for the former.

As a concrete proposal, a single “symmetric cryptography” object with the following methods seems to be enough to cover the needs of all symmetric operations.

Options are provided as a handle. The relevant type has only 3 functions:

Usage examples

Hashing

options = options_open()
hash_op = symmetric_op_open("BLAKE3", key, options)
hash_op.absorb(data)
out = hash_op.squeeze(len)

MAC

options.set("context", "protocol")
options.set("salt", "abcd")
hash_op = symmetric_op_open("BLAKE2b-512", key, {})
hash_op.absorb(data)
tag = hash_op.squeeze_tag()
…
tag.verify(expected_tag)

Hashing multiple inputs

hash_op = symmetric_op_open("TupleHashXOF256", key, {})
hash_op.absorb(a)
hash_op.absorb_next()
hash_op.absorb(b)
tag = hash_op.squeeze(len)

Key derivation

Argon2 example:

options = options_open()
options.set_u64("memlimit", 1 * 1024 * 1024 * 1024)
options.set_u64("opslimit", 5)
options.set_u64("parallelism", 8)
kdf_op = symmetric_op_open("Argon2id", key, options)
kdf_op.absorb(data)
k1 = kdf_op.squeeze(len)
pwhash_str = kdf_op.squeeze_tag() // exportable as a standard hashed password string

HKDF-extract:

kdf_op = symmetric_op_open("HKDF-EXTRACT/SHA-512", key, {})
kdf_op.absorb(seed)
prk = kdf_op.squeeze_key()

HKDF-expand:

kdf_op = symmetric_op_open("HKDF-EXPAND/SHA-512", prk, {})
kdf_op.absorb(info)
k1 = kdf_op.squeeze(len)

AEAD

Encryption:

options = options_open()
options.set("context", "app context")
aead_op = symmetric_op_open("XCHACHA20_POLY1305", key, options)
nonce = aead_op.get("nonce")

aead_op.absorb(ad)
e = aead_op.encrypt(data)
tag = aead_op.squeeze_tag()

Decryption:

options = options_open()
options.set("context", "app context")
options.set("nonce", nonce)
aead_op = symmetric_op_open("XCHACHA20_POLY1305", key, options)
aead_op.absorb(ad)
p = aead_op.decrypt(data, tag)

Stateful Hash Object

options = options_open()
options.set("context", "Noise protocol")
sho_op = symmetric_op_open("SHO/SHAKE256", nil, options)
sho_op.absorb(a0)
sho_op.absorb(a1)
sho_op.absorb_next()
sho_op.absorb(b)
sho_op.encrypt(data)
sho_op.squeeze()
sho_op.ratchet()

Session-based encryption

options = options_open()
options.set("context", “Protocol”)
session_op = symmetric_op_open("Xoofff-SANE", k0, options)
session_op.encrypt(data)
session_op.absorb(a)
session_op.absorb_next()
session_op.absorb(b)
let tag1 = session_op.squeeze_tag()
let k1 = session_op.squeeze_key()
session_op.absorb(c)
let tag2 = session_op.squeeze_tag()
jedisct1 commented 4 years ago

The actual WebAssembly interface is straightforward to derive from this.

Other operations can easily be defined on top of this, for example a DRBG just needs open() and squeeze().

Some functions will return an error. For example ratchet() is not going to always be available. So, we need to define a comprehensive set of errors that each of these functions can return.

Other things that absolutely need to be specified:

jedisct1 commented 4 years ago

We had a discussion about whether symmetric keys and authentication tags should just be represented as raw bytes, or as objects.

The proposed API includes three ways to squeeze data:

With a specialized API for every operation, using types instead of raw bytes implied quite a lot of functions and types that had to be exported. The API proposed here solves this problem. It is minimal, yet encourage bindings to use types instead of raw bytes when it can improve security.

jedisct1 commented 4 years ago

Maybe we need better names than squeeze and absorb, as these are usually only associated with sponge functions.

Suggestions welcome!

ueno commented 4 years ago

We had a discussion about whether symmetric keys and authentication tags should just be represented as raw bytes, or as objects.

The proposed API includes three ways to squeeze data:

Well, I'm sorry, but I suggest that we should discuss a bit more about the objectives, before moving on to the specific solution. Having multiple ways to achieve the goal is certainly good, but I've already raised a point how we could encourage users to choose safer option on a merged PR.

Perhaps it's time to define a proposal process: for example, proposers should wait at least for a week to allow other people to comment before making the actual changes to the repo, and if someone makes a comment (not an approval) on the issue/PR, the proposer should take at least one iteration before merging. I've gone through the signature proposal and the implementation looks great, but I still have several disagreements at the API level.

jedisct1 commented 4 years ago

Absolutely, and this is a very important point. There are quite a lot decisions to be taken and tasks to be done, that affect the entire API.

Examples include error handling, high-level goals as well as types and functions reused in different parts of the API. We will also need to settle on the set of documents we have to produce in order to build a conformant implementation, as well as guidelines for bindings implementers.

In order to keep the discussions scoped, it may be better to open dedicated issues for each of these. Relevant proposals for API parts can then all be updated according to the outcome of these broader discussions.

An issue has been opened for the point you raised. Thanks again!

alterstep commented 4 years ago

I like it, very very elegant!

Suggestion: add

-> direct read and write to files and network socket!

ueno commented 4 years ago

This looks like a really interesting proposal overall; thanks for starting this @jedisct1!

One thing I'm concerned is how to ensure stateful nature of crypto operations we are trying to enforce with type safety. For example, encrypt shouldn't be applied to the hkdf_op object, squeeze_key shouldn't be applied to aead_op, and squeeze_tag shouldn't be called before aead_op.encrypt. With this API, those errors can only be detected at run time, if I understand it right?

jedisct1 commented 4 years ago

Yes, the downside of trying to limit an explosion of the number of types and functions is that more checks will be done at runtime.

But that allows us to fully support what algorithms, not just kind of operations, can do. BLAKE3 used as a KDF supports ratcheting (we don't need different code for when used as a KDF and when use as a XOF). HKDF doesn't. And sponge-based constructions can share the same API as traditional constructions instead of having massive redundancy.

There is nothing wrong with calling squeeze_tag before encrypt, to compute a tag for the additional data.

jedisct1 commented 4 years ago

Mmmm... I don't think there is value in adding functions accepting file descriptors.

Unless everything is handled by the kernel itself, data flowing through these descriptors has to be stored somewhere anyway.

The guest allocates the encryption and decryption buffers used by the host. So this is not zero copy, but there are no additional copies required.

A host implementation accepting descriptors would do exactly the same thing as the guest would, just using host-allocated memory instead of guest-allocated memory.

Less context switches for sure, but context switches are fast in WebAsssembly.

Plus, every function accepting a file descriptor would have to deal with I/O errors as well.

What do you think @ueno ?

jedisct1 commented 4 years ago

Here, squeeze_tag() returns a Tag object, but making it return bytes as you suggested would still be fine, provided that the encoding of the tag is part in the algorithm name.

scrypt and scrypt-str would be different algorithm identifiers, although the function is exactly the same and only the encoding changes.

ueno commented 4 years ago

Mmmm... I don't think there is value in adding functions accepting file descriptors.

I agree: this needs more elaboration on practical use-cases. For secure transport, I guess it would be more useful to have support backed by standardized protocols (TLS, SSH), where those primitives are used as a building block.