Closed robertjchen closed 7 months ago
High Level spec completed, opening for evaluation/discussion/implementation
Hey! We @margelo are excited to get our hands dirty with this!! 💪
Are we assuming synchronous functions for everything, or should there also be async options? E.g. KEMEncrypt(..)
is blocking/synchronous, and KEMEncryptAsync(..)
is asynchronous/Promise based?
Thanks!! I think async might actually be the "default" behavior for all of the functions on the JS side, but I'll leave it up to you all to decide what would work better with our current App design 🙏
I think we should add both options, but maybe async should be default and the sync one should be suffixed with Sync
? It depends on how fast those funcs are. If they#re slow, people should not really use them synchronously. If they're really really fast, we can add sync options.
Makes sense! It would be interesting to benchmark things to compare- I'd imagine some of the functions would be much, much slower when compiled to asm.js
(for Web) while a native module would be much faster 🤔
Just porting over these comments from slack ... it sounds like we've agreed that we should use WebAssembly for web instead of asm.js
So that means we'll use C/C++ and JSI on native and C/C++ and WebAssembly on web
Also, have we agreed on which Kyber implementation we'll use? Are we using the reference implementation?
Interestingly, there's a Rust implementation that says:
Compiles to WASM using wasm-bindgen and has a ready-to-use binary published on NPM
We might be able to use that w/ bindings from Rust -> C -> JSI (example), but it would probably be easier to compile another C/C++ based library for wasm
Also, have we agreed on which Kyber implementation we'll use?
The reference implementation works, but it's done in a way to allow for easy testing/NIST evaluation. There's also https://github.com/PQClean/PQClean , which organizes Kyber and other encryption algorithms to have a standardized interface (for testing, library making, etc.), so it would be good to refer to that implementation as well.
Oh yeah, we've definitely talked about PQClean before in previous POCs. This seems like something we should predesign in slack. @mrousavy @margelo do you want to do some research and lead that predesign?
@roryabraham Sure, lemme talk to my team! :)
@robertjchen @roryabraham i'm gonna start working on implementing the encryption library now!
Gonna keep u updated here and on Slack! 👍🚀
Awesome, please keep us posted on the details!
@robertjchen do we have a name for the library already? I started of by calling it react-native-encryptify
but i can change it whatever you guys prefer.
Also, could you create a GH repo in Expensify with the desired name, so i can push my changes?
Thanks! :)
@robertjchen I'm gonna be on holiday until 20th of August, so i'm not gonna be working on the lib until then. Is it ok to continue then or does this have a higher priority? If so, i can hand it over to another Margelo dev. :)
That name would be great, it's along the same lines as the other react-native
libraries/repos that we have: https://github.com/Expensify?q=react-native&type=all&language=&sort=
However, do we need a separate repo for the C++ part given that we'll be using it elsewhere in the codebase?
In any case, I have gone ahead and created one under Expensify/react-native-encryptify
can you confirm if you have write access? Thanks!!
Awesome, enjoy the time off! I think it would be great to continue weekly progress on it, though- could you see if anyone's available to provide updates for the two weeks? 🙏
No updates here. To keep things moving, I'll schedule a quick call for all of us to sync up on where things are to make sure things are on track
Hey @robertjchen !
Unfortunately i'm still out sick since i came back from my vacation.
Going to be fit again in the coming days and am going to continue working on this then.
No worries, keep us posted- let me know when you're back 100% and we can all catch up 👍
@robertjchen just so we are on the same track here. We want to create a secret and encapsulate it with KEMEncrypt
(Kyber and RSA) and then use this secret for symmetric encryption (AESEncrypt
) and vice versa with decryption, right?
Asking because RSA4096 can only encrypt up to 512bytes (501 in fact) and also performance with a lot of asymmetric encryption would be bad.
@robertjchen we can do a quick catch-up meeting next week if you want, i can show you my progress and we can clear any misunderstandings if there are any.
My plan was to finish this project completely by the end of next week! 🚀
@chrispader can you just post an overview of the plan and any questions in a GitHub comment?
I understand that we'll use symmetric AES encryption for the general communication/messaging between users. For the initial public/private key setup, we want to use both Kyber1024 and RSA4096 in combination, in case any of those two algorithms get broken.
Kyber1024 (by design) yields both cipherText
and sharedSecret
by using the receiver's publicKey
. The sharedSecret
acts like a AES key and can be used to encrypt/decrypt data. The sharedSecret
should therefore not be transmitted...
The cipherText
will be transmitted to the receiver and can be decapsulated
using the Kyber privateKey
of the receiver.
In general, this would be sufficient for post-quantum encryption, as Kyber1024 should be more secure than RSA.
If we want to also use RSA4096 for the asymmetric encryption part, we'll want to additionally either encrypt the sharedSecret
or cipherText
, so that both encryption algorithms are used for the final encryption key.
Encrypting the Kyber sharedSecret
makes Kyber itself redundant and insecure, because the sharedSecret
should not be transmitted, but instead the cipherText
can be used in combination with the Kyber privateKey
to retrieve the sharedSecret
on the receiver's side.
Encrypthing the Kyber cipherText
instead seems logical, but the problem here is, that we can (without extra effort) only encrypt 501bytes of data with RSA4096. (The limitation is 512bytes minus 11bytes of padding). If we want to encrypt the Kyber cipherText
- which consists of 1568bytes, we'll need to split it up in 4 chunks (1586bytes/501bytes = 3.13 => 4) and encrypt each part separately. This is technically no problem, but increases processing time and reduces performance, as asymmetric encryption is way more inefficient than symmetric encryption.
So my main question regarding the implementation is, should we either...
cipherText
with RSA and therefore split it up in 4 chunkssharedSecret
with RSAcipherText
size of 768bytes)cc @robertjchen @roryabraham
I've got most of the implementation for all of this already done, it's just a matter of how we use and arrange these algorithms... 👍
Thanks @chrispader ! 🙇 Appreciate the research and for laying out the details. Yes, I think the goal is to still incorporate RSA somehow.
I think your proposed 1st option would actually be the best, especially since we could potentially parallelize it 👍
Thanks @chrispader ! 🙇 Appreciate the research and for laying out the details. Yes, I think the goal is to still incorporate RSA somehow.
I think your proposed 1st option would actually be the best, especially since we could potentially parallelize it 👍
Ok 👍 I'll go for this approach then, it's definitely the most secure one, and since it's only done once per chat/room, it should be fine.
KEMEncrypt(pubKeys, dataString)
- encrypts a given stringRSA4096_Encrypt(Kyber1024_Encrypt(dataString))
given the pubKey set (input string should be padded behind the scenes if necessary, etc.). ThepubKeyHash
should be a hash of the two public keys combined. The result is the raw encrypted string in base64 format (note that this is directly encrypted by RSA4096 + Kyber1024, not AES!)<base64 data>
KEMDecrypt(privKeys, dataString)
- decrypts a given string given the privKey set (input string should be padded behind the scenes if necessary, etc.)<base64 data>
Another thing: KEMEncrypt
and KEMDecrypt
don't really encrypt/decrypt any data, but instead only encapsulate and "encrypt" the secret that is being used in symmetric AES encryption. Therefore, the parameter dataString
is misleading imo.
The current implementation would be more like this:
KEMEncrypt(pubKeys)
- encapsulates a secret and encrypts it with RSA:RSA4096_Encrypt(Kyber1024_Encapsulate(kyberPubKey))
given the pubKey set (input string should be padded behind the scenes if necessary, etc.). ThepubKeyHash
should be a hash of the two public keys combined. The result is an encrypted cipherText in base64 format, that can then be decrypted usingKEMDecrypt
and used for symmetric AES encryption (note that this is directly encrypted by RSA4096 + Kyber1024, not AES!)<base64 encryptedCipherText>
KEMDecrypt(privKeys, encryptedCipherText)
- decrypts the cipherText given the privKey set and returns the aes encryption key (input string should be padded behind the scenes if necessary, etc.)<base64 aesEncryptionKey>
Should we change this in the issue description?
Also, is that still how you expect this library to work?
We can then use the aesEncryptionKey
on both sides to encrypt data symmetrically using AES.
Quick update here:
I finished all of the KEM
and AES
encryption and decryption. (Only KEMSign
and KEMVerify
left)
Working on WebAssembly support right now. I'm planning on finishing this library by the end of this week. 👍
Basically this is the order of how we have to use these functions:
KEMGenKeys
KEMEncrypt
by using the other side's public keypair. You'll get both a cipherText
and a sharedSecret
. The sharedSecret
can be used to encrypt data on the sender's side directly. (Never transmit the sharedSecret
cipherText
to the recipient.KEMDecrypt
by using the recipient's private keypair. You'll receive a secret.After both sides have stored the decrypted sharedSecret
of the other side, we can now use this to encrypt data such as messages, images or documents. Both the sender and the recipient should now have two secrets of each side respectively.
AESEncrypt
.AESDecrypt
.P.S.: I'll update this flow to use KEMVerify
and KEMSign
once implemented.
cc @robertjchen @roryabraham
You can already try this out in the example project here
Should we change this in the issue description? Also, is that still how you expect this library to work?
So, basically we get ciphertext
and aesEncryptionKey
when we run Kyber1024_Encapsulation(receiverPubKey)
on the sending side. We then RSA4096_Encrypt(ciphertext)
'd and send that to the receiver.
On the receiving side, we call RSA4096_Decrypt(RSAEncryptedCiphertext)
'd to yield the ciphertext
, which is run through Kyber1024_Dencapsulation(receiverPrivKey, ciphertext)
yielding the aesEncryptionKey
.
Is my understanding correct?
Also as far as my research went, we can't use Kyber for signatures and verification. We'll have to use either Dillithium
(by the same creators), Falcon
or Sphincs
.
More details: https://www.nist.gov/news-events/news/2022/07/nist-announces-first-four-quantum-resistant-cryptographic-algorithms
Which one should we use?
Should we change this in the issue description? Also, is that still how you expect this library to work?
So, basically we get
ciphertext
andaesEncryptionKey
when we runKyber1024_Encapsulation(receiverPubKey)
on the sending side. We thenRSA4096_Encrypt(ciphertext)
'd and send that to the receiver.On the receiving side, we call
RSA4096_Decrypt(RSAEncryptedCiphertext)
'd to yield theciphertext
, which is run throughKyber1024_Dencapsulation(receiverPrivKey, ciphertext)
yielding theaesEncryptionKey
.Is my understanding correct?
Yes, exactly! 👍
Also as far as my research went, we can't use Kyber for signatures and verification. We'll have to use either
Dillithium
(by the same creators),Falcon
orSphincs
.
Also not sure if it's necessary to sign and verify the rsaEncryptedKyberCipherText
, but instead mostly sign and verify the data we send.
If somehow the key get's tampered with, the only consequence would be, that the encryption wouldn't work properly. The combination of RSA+Kyber in a way already ensures, that we get the right encryption key.
If on the other side data get's tampered with, the recipient might get wrong or dangerous data. (still very unlikely)
Yes, exactly! 👍
Great! That should work for our purposes, thanks 🙌
Also not sure if it's necessary to sign and verify the rsaEncryptedKyberCipherText, but instead mostly sign and verify the data we send.
Also as far as my research went, we can't use Kyber for signatures and verification. We'll have to use either Dillithium (by the same creators), Falcon or Sphincs.
That makes sense to me- however, the only pitfall I can think of is that you don't know if the person sending you the ciphertext
is really the person that they claim to be. Since anyone is able to arbitrarily establish a shared secret with you if they know your public RSA and Kyber key without having to prove they are who they are 🤔
Without introducing a new algo, I think what we can do is add on another piece of data to prove rsaEncryptedKyberCipherText
is coming from the correct sender by just a few more steps.
The sender would use their own private Kyber key to generate a new verificationCipherText
and verificationSharedSecret
. They would then RSA encrypt a known string or hash of the message to be sent, using their own private key, yielding rsaEncHash
.
Using verificationSharedSecret
, they would then AES-encrypt rsaEncHash
as encryptedRSAEncHash
and then send those along to the receiver, which would serve as the signature.
The receiver would get a signature composed of the following:
Signature = verificationCipherText
|| encryptedRSAEncHash
verificationCipherText
, and see if they can decrypt encryptedRSAEncHash
using the shared secret.encryptedRSAEncHash
yields rsaEncHash
, then they can try to decrypt this using the RSA public key of the sender/person they intend on receiving messages from.That should be enough to verify message authenticity and serve as a signature mechanism. If either RSA or Kyber was broken one of those above checks would fail, ensuring safety.
What are your thoughts on that approach? 🙏
The sender would use their own private Kyber key to generate a new
verificationCipherText
andverificationSharedSecret
. They would then RSA encrypt a known string or hash of the message to be sent, using their own private key, yieldingrsaEncHash
.
The problem here is, that we can't use the Kyber private key for encapsulating a key. It's only working in the opposite direction. That's why Kyber is fundamentally not designed for signing and verifying, that's what say Dillithium is for.
The proposal seems very logical, but i don't think we can achieve this with Kyber...
Ah that makes sense! It looks like it's a one-way operation and only for key exchange 😞
In that case, let's move forward with just plain encryption for now without signatures. 👍
(The server would be the one to ensure proper identity- certain classes of attacks would be feasible but could be mitigated by user awareness and other protections elsewhere in the stack. It's always a tradeoff between complexity and security and we're already made those choices for the sake of usability so I don't think we'll miss this aspect too much 😅 )
Got it! 👍
So all of the encryption functionality is already working and we should be ready to either publish it to some (private) package registry or use it directly through git.
The only thing i'm currently still working on is using the .wasm
binary in web. It basically compiles already, but i haven't made it work in Expensify/App yet.
Awesome, can't wait to see the .wasm
part working as well! 🙌
Giving a brief update on the WebAssembly journey here...
The WebAssembly process on web principally consists of 3 parts:
.wasm
-binary ✅Steps 2 and 3 are already done and working. I created compilation scripts for compiling the library as well as compiling a local openssl
git submodule. These scripts are (almost) cross-platform already, so it should be possible to compile the library on any OS. I guess, we'll automate this process in a pipeline anyway, but this might be good for development and debugging in the future.
Since WebAssembly relies on C++ code to be compiled with the Emscripten toolchain, we have to compile OpenSSL ourself, since there are no working pre-compiled binaries. Also, we want to ensure, that we're using the original library for security reasons.
I've had some major breakthroughs in this process yesterday and i'm currently fixing some memory allocation problems, when it comes to using RSA from the OpenSSL library. Other than that, everything should be done 👍
I'm hoping to finish this by the end of the week.
@chrispader Great work! Let us know how it goes (if OpenSSL's complexity proves a bit too much, maybe we could consider just using the basic raw reference C implementations for RSA/AES, especially since we're just using a small part of OpenSSL?)
@chrispader any update here?
Yes! Just made the library completely work with WebAssembly for the first time 🥳🥳🥳
There are some quirks and some weird stuff going on when compiling WebAssembly, i'd like to investigate and then document, so the whole build process is clear 👍
cc @roryabraham @robertjchen
I added a PR/branch for testing the WebAssembly library: https://github.com/Expensify/App/pull/30146
This PR also introduces a useEncryptify
hook, because WebAssembly is effectively async and has to be loaded at app start
Awesome work! Can't wait to try this out locally and see what the numbers are like: https://github.com/Expensify/App/issues/30341
Update: Ongoing work in https://github.com/Expensify/App/pull/30146
Looking forward to hardware benchmarks, next steps discussion/planning in progress.
Hardware benchmarks posted. Ongoing discussion on next steps!
Given that the library itself is completed, I think we can close this out for now! 🎉🎉 We'll sync on the main issue on next steps and create new issues from there. Great job @chrispader ! 🙇
cc: Margelo
Please implement a custom encryption library to be used as part of the new End to End Encryption feature in the App.
Namely, it will provide symmetric (AES) and asymmetric (RSA4096 + Kyber1024) encryption functions to be used by the App as well as in the backend.
Please refer to the planning doc for additional context!
Considerations
wasm
binary for direct use in the Web client.Proposed Interface
// synchronous mockup, but final solution may be asynchronous as well 👍
KEMGenKeys()
- return a JSON object in the format of:For the following functions, the
pubKeys
andprivKeys
arguments should be provided in JSON format:KEMEncrypt(pubKeys, dataString)
- encrypts a given stringRSA4096_Encrypt(Kyber1024_Encrypt(dataString))
given the pubKey set (input string should be padded behind the scenes if necessary, etc.). ThepubKeyHash
should be a hash of the two public keys combined. The result is the raw encrypted string in base64 format (note that this is directly encrypted by RSA4096 + Kyber1024, not AES!)KEMDecrypt(privKeys, dataString)
- decrypts a given string given the privKey set (input string should be padded behind the scenes if necessary, etc.)KEMSign(privKeys, dataString)
- signs a given string given the privKey set (see doc for additional implementation notes)KEMVerify(pubKeys, dataString)
- verifies the signature of a given data string given the pubKey setAESDecrypt(iv, key, data)
// simple symmetric encryption w/ AES-GCMAESEncrypt(iv, key, data)
// simple symmetric encryption w/ AES-GCM