cossacklabs / themis

Easy to use cryptographic framework for data protection: secure messaging with forward secrecy and secure data storage. Has unified APIs across 14 platforms.
https://www.cossacklabs.com/themis
Apache License 2.0
1.85k stars 143 forks source link

Support for streaming a large file via Context Imprint (AES CTR) #1043

Open ashughes opened 6 months ago

ashughes commented 6 months ago

Is your feature request related to a problem? Please describe. We need to be able to encrypt large files as well as perform random access when decrypting files. Based on my understanding of AES CTR, this should already be possible, however, there are no Themis APIs that allow us to do this.

Describe the solution you'd like to see I believe the following API (in Java) should provide the necessary primitives to support this use case with minimal dependencies (e.g. no Java File or stream APIs):

public interface ContextImprintByOffset {
  byte[] encrypt(byte[] data, byte[] context, long offset);
  byte[] decrypt(byte[] data, byte[] context, long offset);
}

In the above interface an offset parameter is added which would be used to offset the counter appropriately when using the IV to encrypt and decrypt each byte. Additionally, the context parameter could be removed from this interface and instead provided as a constructor parameter when creating the instance (along with the key) since the additional context should be the same for every call to encrypt and decrypt for a given file.

Describe alternatives you've considered An alternative is that we could split our large files into chunks, encrypt each chunk individually using Seal, Token Protect, or Context Imprint, and concatenate each chunk into an output file. For random access reads, we would determine which chunks need to be decrypted in order to return the requested data. This would effectively be creating our own data format based on the chosen encryption mode, chunk size, and header format.

Since AES CTR effectively already supports this capability, it would be nice to simply utilize that instead of defining our own format and strategy.

Lagovas commented 6 months ago

Hi, sorry for the late answer. We don't plan to add such API because the main goal of the Themis design is to be as simple as possible for users who are not familiar with cryptography and hide all cryptography's complexity. So, most of the modes provide 1 function to encrypt and 1 to decrypt. Without any state except a key in some languages. Encryption of large files requires some state object that should be initialized and passed to every next encryption call. This state will hold the counter value and IV. The same counter and IV shouldn't be used for the same plaintext. So, users must not forget to create a new context for every new plaintext to generate a new IV. All these things complicate usage and require an understanding of security risks. That we wanted to avoid.

We suggest solving it on the application level, as you already described:

An alternative is that we could split our large files into chunks, encrypt each chunk individually using Seal, Token Protect, or Context Imprint, and concatenate each chunk into an output file.

In the case of secure cell in seal mode, ciphertext would have authentication tag that provides integrity checks. So your app would detect tampering. In the case of context imprint, app wouldn't detect tampering and as a result, get corrupted plaintext. So will be great to have at least MAC for the whole file to verify it's integrity.

ashughes commented 6 months ago

Thanks for the thoughtful response! I totally understand the intent is to keep things simple and hide the complexity of cryptography (and we definitely appreciate it).

However, I wanted to clarify a few things. Unless I have a misunderstanding of how this would work, I think the only additional input needed from the user would be the offset parameter. The context parameter could be removed completely from the interface and instead provided as a constructor parameter when creating the Context Imprint instance (along with the key) since the additional context should be the same for every call to encrypt and decrypt for a given file.

Under the hood, the implementation would then determine the correct counter value and IV for the given offset. My understanding is that this is how AES CTR works already. The difference is just allowing the user to provide an offset so that the encryption and decryption can be done in chunks instead of all at once.

In my opinion, encrypting/decrypting large files is a common use case that's missing from Secure Cell.

Lagovas commented 6 months ago

How to protect the users from the re-usage of the same object?

SymmetricKey symmetricKey = new SymmetricKey();
byte[] context = ....
SecureCell.ContextImprint encryptor = SecureCell.ContextImprintWithKey(symmetricKey, context);

byte[] plaintext1 = ...
byte[] plaintext2 = ...

encryptor.encrypt(plaintext1, 0)
encryptor.encrypt(plaintext2, 0)  // !!!!

We want to prevent the usage of the same encryptor with the same IV and counter for different plaintext. Such interface cannot protect from this flow.

ashughes commented 6 months ago

We thought about this scenario as well when discussing internally and I agree with you that it is not ideal. However, it is only slightly different than the current problem of using the same additional context with multiple calls to encrypt:

SymmetricKey symmetricKey = new SymmetricKey();
byte[] context = ....
SecureCell.ContextImprint encryptor = SecureCell.ContextImprintWithKey(symmetricKey);

byte[] plaintext1 = ...
byte[] plaintext2 = ...

encryptor.encrypt(plaintext1, context)
encryptor.encrypt(plaintext2, context)  // !!!!

The only two options we've come up with that might help clarify are:

  1. Make the constructor something like SecureCell.ContextImprintWithKeyForFile(symmetricKey, context) and document that this should only be used for a single file and a new instance should be created with a new context for each file.
  2. Keep the additional context in the encrypt and decrypt (like my original example interface) and document that it must be the same value for each subsequent call for the same file.