NativeChaCha20 decrypt crashes with large data inputs

eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.

Other

3.27k stars 721 forks source link

NativeChaCha20 decrypt crashes with large data inputs #9535

Open samasri opened 4 years ago

samasri commented 4 years ago

Originally described by @keithc-ca in #9429 comment.

NativeChaCha20Cipher.EngineAEADDec appears to capture the entire input in a ByteArrayOutputStream: if the input is large (think gigabytes) this is not reasonable.

To elaborate:

When a client calls cipher.engineUpdate or cipher.engineFinal with a ByteBuffer as input, bufferCrypt will be called (as shown here and here).
In case of ChaCha20, bufferCrypt calls one of the engineUpdate and engineDoFinal functions in NativeChaCha20Cipher.java.
Those functions call the underlying doUpdate call. In case of AEAD decryption, the doUpdate copies the input to a buffer
Hence calling cipher.engineUpdate continuously with large input sizes (gigabytes) will cause a crash.

shanchao95 commented 4 years ago

looked into the code.

doUpdate returns 0 or a 0 byte array. doFinal returns cumulated input size or a byte array with encrypted output.
Currently, an input data is stored in a ByteArrayOutputStream buffer when invoke doUpdate, and encrypt all inputs until calling doFinal.
have to store/cumulate the input data somewhere before calling doFinal, which means no mater what container, the same issue is reproduced when input is large.

ideas about to solve it

limit the size of ByteArrayOutputStream. If the buffer overflows, doFinal is invoked no matter doFinal is called or not.
Return a non-zero size or a encrypted output to acknowledge user that previous input has been encrypted and outputted.

The idea should be helpful for a cumulated large input, such as calling doUpdate multiple times with small inputs. However, it may have no help about inputing a large size of data at one time.

May i have any suggestions, thanks.

samasri commented 4 years ago

limit the size of ByteArrayOutputStream. If the buffer overflows, doFinal is invoked no matter doFinal is called or not

In case the buffer overflows, you would need to call doUpdate and not doFinal. This is because doFinal will finalize the operation so you can't call doUpdate afterwards. So for example, consider a case when the internal buffer size is 10KB and the user is trying to encrypt 1MB of data (divided into 1KB chunks). The buffer will overflow after few calls for doUpdate. When the buffer overflows, you should not call doFinal because if you do, the cipher will be finalized and it can't encrypt the rest of the data.

As we discussed with @ashbm5 today, a better approach for now is to encrypt the data as soon as the doUpdate receives them. No need to buffer them.

However, it does nothing about inputing a large size of data at one time.

Yes that is still a valid concern.

DanHeidinga commented 4 years ago

We're coming up on the v0.21.0 code split on June 7. Will this be ready and merged before then?

alon-sh commented 4 years ago

@DanHeidinga @keithc-ca We actually have the change ready for review. However, the implementation problem pointed out by Keith - that the input data is stored in a buffer and not processed immediately, which can cause out of memory problems, is also shared with OpenJDK java (non-OpenSSL) implementation. https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java#L1339

This behavior is expected in a ChaCha20 unit test so we will also need to modify a ChaCha20. Strictly speaking, it should be fine to modify since the crypto API allows for both types of behaviors (immediate return or storing in a buffer and returning at some other point in time). However, since this is a change in external API behavior, it may break something unexpected at some future point.

DanHeidinga commented 4 years ago

This behavior is expected in a ChaCha20 unit test

So both OpenJDK's implementation and our implementation have the same limitation of needing to buffer the input until it's complete? Sounds like something we shouldn't be changing unless there's a compelling reason to, unless I've missed something

pshipton commented 4 years ago

Or we should make the change in OpenJDK and contribute it first.

pshipton commented 4 years ago

This is an enhancement. Since it's not delivered before the 0.21 branch, deferring to the next release.

pshipton commented 3 years ago

Moving to the backlog.