getodk / xforms-spec

The XForms-derived specification used in the ODK ecosystem. If you are interested in building a tool that is compliant with the forms rendered by ODK tools, this is the place to start. ✨⚒✨
https://getodk.github.io/xforms-spec/
30 stars 26 forks source link

Describe encrypted submission format #186

Closed MartijnR closed 5 years ago

MartijnR commented 6 years ago

Currently, there is not enough information to implement local encryption and create an acceptable submission that can be decrypted with ODK Briefcase.

Some info: https://groups.google.com/forum/#!topic/opendatakit-developers/Kjo6bxNqdVs

MartijnR commented 6 years ago

I think these are the 2 algorithms used:

Symmetric encryption method (for the content): AES/CFB/PKCS5Padding (256 bit key) Asymmetric encryption method (for the key): RSA/NONE/OAEPWithSHA256AndMGF1Padding

https://github.com/opendatakit/collect/blob/adb57e5659e160a241f753a1f65f6963f82a48fd/collect_app/src/main/java/org/odk/collect/android/utilities/EncryptionUtils.java

lognaturel commented 5 years ago

@MartijnR @ggalmazor would be great to elaborate on this while it's fresh in your minds.

ggalmazor commented 5 years ago

I'll try to braindump here what I've been able to reverse engineer of how Briefcase decrypts submissions.

(What follows is out of my league at this moment and my interpretation of things could be plain and simply wrong)

About the ciphers used while decrypting

About decrypting the symmetric key in the first place

About validating decrypted data with the cryptographic signature


I think that's everything I've got :)

There are some things I'd love someone explain to me:

MartijnR commented 5 years ago

Thanks! That's great. I'll read through it. Where should the documentation live? Shall we put in the XForms spec, but in a separate doc?

During Enketo's implementation (which is still ongoing), and like @ggmalzor, I also started thinking about changing encryption using a fast modern method that has some stuff built-in (signature, iv appended?) and is easier to implement (to match between platforms). It's a separate discussion of course. It seems to me that we could quite easily do this by adding a reference to the encryption method in the submission manifest and giving Briefcase the ability to handle both. Afterwards, we could safely switch Enketo and Collect to the new method (requiring users to update Briefcase seems reasonable).

lognaturel commented 5 years ago

This is great! I think seeing any additional notes @MartijnR might have from client-side implementation might also help figure out where the information should go and how it should be structured.

Why not encrypt everything with the public key and use the private key for decryption?

The length of the message would then be capped by the length of the key. Also, asymmetric encryption is slow whereas symmetric encryption is fast. As far as I know, using asymmetric encryption to encrypt a symmetric key is a common scheme.

MartijnR commented 5 years ago

I've read through it and I think this can be turned in to a spec! I'd be happy to give this a shot and create a PR for your reviews.

submission.xml.enc (or whatever filename defined in the encryptedXmlFile of the submission.xml file) is the file that contains the actual submission's encrypted data.

I just wanted to ask for confirmation that this filename is indeed flexible. It sounds sensible. I think from an XForms spec perspective no filenames are fixed (and from the submission API spec perspective only xml_submission_file is fixed, I believe).

A few thoughts on how to structure this and what to include/not include:

  1. Create a separate doc and link from main spec.
  2. Start by describing the XML format of the submission manifest (incl namespaces, nodenames, attributes).
  3. Then describe algorithms used to encrypt and populate the manifest in different logical sections. Wrt to @ggalmazor's excellent overview, expand on padding and iv generation algorithm which both tripped me up heavily during implementation
  4. any briefcase-specific requirements are not documented here (I think this just concerns that the manifest file is called submission.xml if copied manually).
  5. perhaps refer to the OpenRosa submission API and mention there is nothing special in terms of actually submitting encrypted records vs. non-encrypted records (which is a great thing).
lognaturel commented 5 years ago

Seems like a great plan of attack to me!

I also started thinking about changing encryption using a fast modern method that has some stuff built-in (signature, iv appended?) and is easier to implement (to match between platforms)

Would be on board with this but wondering how it would rank in terms of priorities once major clients are compatible and there is reasonable documentation to go off of (I know, I know, not cleaning up messes is bad! But it's far from the only one...).

ggalmazor commented 5 years ago

I just wanted to ask for confirmation that this filename is indeed flexible.

On paper, it should be, but let's be really sure. I'll test this and come back with results.

MartijnR commented 5 years ago

Would be on board with this but wondering how it would rank in terms of priorities once major clients are compatible and there is reasonable documentation to go off of

Yes, true. We should probably wait for a trigger to do this. The 2 triggers that may come up in the future:

  1. When current encryption is not deemed safe enough any more. I read some comments on avoiding AES/CFB already, but haven't really looked into this.
  2. If performance of current algorithm becomes a problem. I know for web clients, Web Crypto would likely be a large factor faster and doesn't support AES/CFB. However, as far as I can see performance is not necessarily a problem except for surveys collecting large media files (on a low-spec device).
ggalmazor commented 5 years ago

I'm back with the results about the .enc file. I've verified that you can have any filename on the encryptedXmlFile node in the manifest submission.xml file.

MartijnR commented 5 years ago

Great. Thanks!

issa-tseng commented 5 years ago

I think I can answer a couple of these questions but only from a general knowledge standpoint; I have no history with this code.

  • Why the symmetric key? Why not encrypt everything with the public key and use the private key for decryption? It feels like an extra unnecessary step.

Asymm encryption tends to be very slow and produce somewhat bloated payloads. Most asymm protocols are meant as key-exchange protocols, so the two machines establish pub/priv with each other just to exchange symm keys to use for subsequent communication, and then they proceed with that. So, similar thing with payload encryption, just it's all stuffed into one package. I do something quite similar for ODK Central backups.

  • Why make the decryption process dependant on the order in which files are decrypted?. Because of this, we lose the ability of paralelizing the decryption of files. It feels like an unnecessary restriction.

It's definitely a tradeoff. The main reason you'd want to do it this way is so that you don't have to create n initialization vectors for n files, which just generates more payload and more homework before the correct things can be done. With one continuous stream of binary information, you can just use one IV at the very start of the whole payload and run the whole thing down from there.

issa-tseng commented 5 years ago

In general in terms of overhauling the system and the technologies used.. based on the description above my personal feeling is that yes there are improvements that could be had and some cruft that could be filtered out, but it's not altogether horribly bad or outdated as-is. The use of md5 is the biggest issue I think.

My primary focus and goal is to make the user experience around the technology better (specifically, stop making users generate and manage their own keypairs), and as far as I can tell there's nothing about that protocol that makes this impossible.