Closed MartijnR closed 5 years ago
I think these are the 2 algorithms used:
Symmetric encryption method (for the content): AES/CFB/PKCS5Padding (256 bit key) Asymmetric encryption method (for the key): RSA/NONE/OAEPWithSHA256AndMGF1Padding
@MartijnR @ggalmazor would be great to elaborate on this while it's fresh in your minds.
I'll try to braindump here what I've been able to reverse engineer of how Briefcase decrypts submissions.
submission.xml
is a manifest file that includes, among other standard info like the form ID and the instance ID:base64EncryptedKey
node with an encrypted symmetric key, which will be used for decryption.encryptedXmlFile
node with the filename of the encrypted submission file. Commonly submission.xml.enc
.base64EncryptedElementSignature
node with an encyrpted cryptographic signature to detect tampering of the encrypted data.media
nodes with a file
child node on each, listing all the media attachment files that this submission has.encrypted="true"
attribute.submission.xml.enc
(or whatever filename defined in the encryptedXmlFile
of the submission.xml
file) is the file that contains the actual submission's encrypted data.media.file
nodes in submission.xml
media.file
nodes.submission.xml
but the file is nowhere to be found), the whole submission is skipped and won't be exported.submission.xml.enc
) is decrypted.(What follows is out of my league at this moment and my interpretation of things could be plain and simply wrong)
About the ciphers used while decrypting
private CipherFactory(String instanceId, byte[] symmetricKeyBytes) {
symmetricKey = new SecretKeySpec(symmetricKeyBytes, "AES/CFB/PKCS5Padding");
// construct the fixed portion of the iv -- the ivSeedArray
// this is the md5 hash of the instanceID and the symmetric key
try {
MessageDigest md = MessageDigest.getInstance("MD5");
md.update(instanceId.getBytes(UTF_8));
md.update(symmetricKeyBytes);
byte[] messageDigest = md.digest();
ivSeedArray = new byte[IV_BYTE_LENGTH];
for (int i = 0; i < IV_BYTE_LENGTH; ++i) {
ivSeedArray[i] = messageDigest[(i % messageDigest.length)];
}
} catch (NoSuchAlgorithmException e) {
String msg = "Error constructing ivSeedArray";
log.error(msg, e);
throw new CryptoException(msg + " Cause: " + e);
}
}
AES/CFB/PKCS5Padding
symmetric key.Each time we need to decrypt some file, we ask the factory for the next Cipher
instance:
Cipher next() {
try {
++ivSeedArray[ivCounter % ivSeedArray.length];
++ivCounter;
IvParameterSpec baseIv = new IvParameterSpec(ivSeedArray);
Cipher c = Cipher.getInstance("AES/CFB/PKCS5Padding");
c.init(Cipher.DECRYPT_MODE, symmetricKey, baseIv);
return c;
} catch (NoSuchAlgorithmException | InvalidKeyException | InvalidAlgorithmParameterException | NoSuchPaddingException e) {
throw new CryptoException(e);
}
}
AES/CFB/PKCS5Padding
Cipher About decrypting the symmetric key in the first place
Cipher pkCipher;
pkCipher = Cipher.getInstance("RSA/NONE/OAEPWithSHA256AndMGF1Padding");
pkCipher.init(Cipher.DECRYPT_MODE, privateKey);
byte[] encryptedSymmetricKey = Base64.decodeBase64(base64EncryptedKey);
byte[] decryptedKey = pkCipher.doFinal(encryptedSymmetricKey);
RSA/NONE/OAEPWithSHA256AndMGF1Padding
, and then we decrypt whatever comes in the base64EncryptedKey
node of the submission.xml
file.About validating decrypted data with the cryptographic signature
base64EncryptedElementSignature
node of the submissions.xml
file.
RSA/NONE/OAEPWithSHA256AndMGF1Padding
Cipher to decrypt it.String buildSignature(Submission originalSubmission) {
List<String> signatureParts = new ArrayList<>();
signatureParts.add(metaData.getFormId());
metaData.getVersion().ifPresent(signatureParts::add);
signatureParts.add(metaData.getBase64EncryptedKey().orElseThrow(() -> new ParsingException("Missing base64EncryptedKey element in encrypted form")));
signatureParts.add(metaData.getInstanceId().orElseGet(() -> "crc32:" + checksumOf(originalSubmission.path)));
for (String mediaName : metaData.getMediaNames()) {
Path decryptedFile = workingDir.resolve(stripFileExtension(mediaName));
signatureParts.add(decryptedFile.getFileName() + "::" + getMd5Hash(decryptedFile.toFile()));
}
signatureParts.add(originalSubmission.path.getFileName().toString() + "::" + getMd5Hash(path.toFile()));
return String.join("\n", signatureParts) + "\n";
}
valid
column filled with a true
value if the signatures match, false
otherwise.I think that's everything I've got :)
There are some things I'd love someone explain to me:
Thanks! That's great. I'll read through it. Where should the documentation live? Shall we put in the XForms spec, but in a separate doc?
During Enketo's implementation (which is still ongoing), and like @ggmalzor, I also started thinking about changing encryption using a fast modern method that has some stuff built-in (signature, iv appended?) and is easier to implement (to match between platforms). It's a separate discussion of course. It seems to me that we could quite easily do this by adding a reference to the encryption method in the submission manifest and giving Briefcase the ability to handle both. Afterwards, we could safely switch Enketo and Collect to the new method (requiring users to update Briefcase seems reasonable).
This is great! I think seeing any additional notes @MartijnR might have from client-side implementation might also help figure out where the information should go and how it should be structured.
Why not encrypt everything with the public key and use the private key for decryption?
The length of the message would then be capped by the length of the key. Also, asymmetric encryption is slow whereas symmetric encryption is fast. As far as I know, using asymmetric encryption to encrypt a symmetric key is a common scheme.
I've read through it and I think this can be turned in to a spec! I'd be happy to give this a shot and create a PR for your reviews.
submission.xml.enc (or whatever filename defined in the encryptedXmlFile of the submission.xml file) is the file that contains the actual submission's encrypted data.
I just wanted to ask for confirmation that this filename is indeed flexible. It sounds sensible. I think from an XForms spec perspective no filenames are fixed (and from the submission API spec perspective only xml_submission_file
is fixed, I believe).
A few thoughts on how to structure this and what to include/not include:
submission.xml
if copied manually).Seems like a great plan of attack to me!
I also started thinking about changing encryption using a fast modern method that has some stuff built-in (signature, iv appended?) and is easier to implement (to match between platforms)
Would be on board with this but wondering how it would rank in terms of priorities once major clients are compatible and there is reasonable documentation to go off of (I know, I know, not cleaning up messes is bad! But it's far from the only one...).
I just wanted to ask for confirmation that this filename is indeed flexible.
On paper, it should be, but let's be really sure. I'll test this and come back with results.
Would be on board with this but wondering how it would rank in terms of priorities once major clients are compatible and there is reasonable documentation to go off of
Yes, true. We should probably wait for a trigger to do this. The 2 triggers that may come up in the future:
I'm back with the results about the .enc
file. I've verified that you can have any filename on the encryptedXmlFile
node in the manifest submission.xml
file.
Great. Thanks!
I think I can answer a couple of these questions but only from a general knowledge standpoint; I have no history with this code.
- Why the symmetric key? Why not encrypt everything with the public key and use the private key for decryption? It feels like an extra unnecessary step.
Asymm encryption tends to be very slow and produce somewhat bloated payloads. Most asymm protocols are meant as key-exchange protocols, so the two machines establish pub/priv with each other just to exchange symm keys to use for subsequent communication, and then they proceed with that. So, similar thing with payload encryption, just it's all stuffed into one package. I do something quite similar for ODK Central backups.
- Why make the decryption process dependant on the order in which files are decrypted?. Because of this, we lose the ability of paralelizing the decryption of files. It feels like an unnecessary restriction.
It's definitely a tradeoff. The main reason you'd want to do it this way is so that you don't have to create n initialization vectors for n files, which just generates more payload and more homework before the correct things can be done. With one continuous stream of binary information, you can just use one IV at the very start of the whole payload and run the whole thing down from there.
In general in terms of overhauling the system and the technologies used.. based on the description above my personal feeling is that yes there are improvements that could be had and some cruft that could be filtered out, but it's not altogether horribly bad or outdated as-is. The use of md5
is the biggest issue I think.
My primary focus and goal is to make the user experience around the technology better (specifically, stop making users generate and manage their own keypairs), and as far as I can tell there's nothing about that protocol that makes this impossible.
Currently, there is not enough information to implement local encryption and create an acceptable submission that can be decrypted with ODK Briefcase.
Some info: https://groups.google.com/forum/#!topic/opendatakit-developers/Kjo6bxNqdVs