gleam-lang / crypto

⛓️ Crypto functionality for Gleam applications
https://hexdocs.pm/gleam_crypto/
Apache License 2.0
36 stars 11 forks source link

Decouple hash algorithms from sign algorithms #12

Open NorbertSzydlik opened 1 month ago

NorbertSzydlik commented 1 month ago

Functions hash, new_hasher, hmac and sign_message all use HashAlgorithm to define which algorithm to use.

I tried adding new hash algorithms supported by Erlang and Node.js. I stumbled on problem with sign_message. It seems that this function implements creation of JSON Web Token (RFC-7519, https://datatracker.ietf.org/doc/html/rfc7519). It uses "alg" field to define which Message Authentication Code algorithm was used to sign the message. I could not find definition for SHA3-X algorithm family in docs (RFC-7518, https://datatracker.ietf.org/doc/html/rfc7518). Additionally, if someone wanted to implement non hash-based message authentication codes (HMAC) it would be impossible to do so with current state. It is because MAC algorithms like ES256 (ECDSA using P-256 and SHA-256) are not hash based, so algorithm definition for it would make no sense for functions like hash.

My proposal is to move hash functions, hmac functions and sign_message to separate submodules: hash, mac and jwt respectively. By doing so you will have nice separation of concerns. It will also prevent collisions between algorithm names. Sha256 can mean something different for the hash, module, as it refers to a cryptographic hash function, whereas in the mac module, it may represent the hashing algorithm used to generate a Message Authentication Code (MAC). By doing so you could also work on different domains of cryptography without affecting other ones.

My other, less involved proposal is to define separate types for MAC and JWT algorithms. It will also allow to work separately on different types of algorithms. However it will not provide separation of concerns found in Erlang and in Node.

Both my proposals will introduce breaking changes, so I wanted to first discuss them here.

lpil commented 1 month ago

There is no desire to support JWT with this package. I actually want to discourage their use wherever possible so it would be counter-productive to have any support for them in core libraries.

NorbertSzydlik commented 1 month ago

That's great and I agree with the direction. I wanted to add SHA3 to hash function and I don't know how can I proceed with current implementation. It would be great to share the implementation, how can I do it?

lpil commented 1 month ago

Sorry, I'm getting confused and not understanding the issue. Why wouldn't you implement SHA3 for both hash and hmac?

NorbertSzydlik commented 1 month ago

I think I wanted to do too many things at once. I will explain my train of thoughts. In summary, with current code it is impossible to add new hash algorithms without inventing JWT names for them.

At first I wanted to add SHA3 to the library. When I tried to do just that, I stumbled on signing_input function. It has case expression on HashAlgorithm to convert it to names recognized by JWT. So it is impossible to just add new variants, without defining how they convert to JWT names. My first thought was to just find how they are defined in RFCs. I could not find anything. Then I thought JWT algorithms should be decoupled from HashAlgorithm. This would introduce breaking changes. That's why I looked at all places where HashAlgorithm is used. I found that it is used in three places: in hash (and recently in hash_init), hmac and in sign_message.

I came to conclusion that it would be beneficial to decouple all the places where HashAlgorithm is used, if breaking changes are needed anyway. It would create three types: HashAlgorithm, MacAlgorithm and JWTAlgorithm for hash, hmac and sign_message respectively.

This decoupling would allow to do three things:

Alternatively one can just remove sign_message, so there wouldn't be a problem with defining JWT names for all variants of HashAlgorithm. Then adding new hash algorithms would be a matter of defining new variants.

lpil commented 1 month ago

Sorry, I don't know what you mean about the JWT considerations. There will be no changes made to this library in service of JWT so non-JWT related motivations are required.

NorbertSzydlik commented 1 month ago

By JWT I mean this part of the code. I call it JWT, because names like HS224, HS256, etc. are related to it. They are defined in RFC-7518 JSON Web Algorithms.

// Based off of https://github.com/elixir-plug/plug_crypto/blob/v1.2.1/lib/plug/crypto/message_verifier.ex#L1
//
/// Sign a message which can later be verified using the `verify_signed_message`
/// function to detect if the message has been tampered with.
///
/// A web application could use this verifier to sign HTTP cookies. The data can
/// be read by the user, but cannot be tampered with.
///
pub fn sign_message(
  message: BitArray,
  secret: BitArray,
  digest_type: HashAlgorithm,
) -> String {
  let input = signing_input(digest_type, message)
  let signature = hmac(<<input:utf8>>, digest_type, secret)

  string.concat([input, ".", bit_array.base64_url_encode(signature, False)])
}

fn signing_input(digest_type: HashAlgorithm, message: BitArray) -> String {
  let protected = case digest_type {
    Sha224 -> "HS224"
    Sha256 -> "HS256"
    Sha384 -> "HS384"
    Sha512 -> "HS512"
    Sha1 -> "HS1"
    Md5 -> "HMD5"
  }
  string.concat([
    bit_array.base64_url_encode(<<protected:utf8>>, False),
    ".",
    bit_array.base64_url_encode(message, False),
  ])
}

Specifically I have issue with this specific case:

  let protected = case digest_type {
    Sha224 -> "HS224"
    Sha256 -> "HS256"
    Sha384 -> "HS384"
    Sha512 -> "HS512"
    Sha1 -> "HS1"
    Md5 -> "HMD5"
  }

What should you do when you want to add new algorithm, for example: Sha3_256 -> "???" ? I could not find definition for it in any RFC. I proposed to use different type than HashAlgorithm for sign_message, because algorithms for sign_message and hash just intersect. With that sign_message could support more algorithms (for example RS256, which is not hash based). And hash function could support more hashing algorithms, if it wasn't restricted by sign_message.

lpil commented 1 month ago

Thank you. What is the motivation for adding this new algorithm?

NorbertSzydlik commented 1 month ago

I wanted to implement CUID2 for Gleam and it uses SHA3 for making id's look random. The other reason for adding SHA3 is completeness. If both Erlang and JS support it, then I don’t see reason for Gleam not to support it.

lpil commented 1 month ago

If you implement this in your package then there would be no need to disrupt all the applications and packages that depend on this package. That seems beneficial if there is no other motivation today for splitting the API.

NorbertSzydlik commented 1 month ago

The other motivation is to make crypto library from Gleam complete. Currently it does not support all algorithms for hashing and signing messages available in Erlang and Node.

lpil commented 1 month ago

That's not a goal of this package presently.