Clarify Encrypt behavior based on plaintext length input

Related to unblocking: #71

Encryption "plaintext length" param behavior

When performing the encrypt operation on a plaintext where the plaintext length is not immediately known, users MUST be able to specify the optional parameter maxPlaintextLength. The value of this field represents the bound on the length of the plaintext to be encrypted. The ESDK MUST NOT encrypt a plaintext greater than this length, and MUST fail if it can be determined during encryption that the actual plaintext length is greater than what the user supplied on input. The actual name of this parameter MAY be different per implementation.

Similarly, the CMM's GetEncryptionMaterials should clearly define maxPlaintextLength as an input that represents the maximum size of the plaintext that will be encrypted using these materials. Having this parameter is integral for having the Caching CMM work for streaming encrypt operations. The actual name of this input MAY be different per implementation.

However, what should happen if a user inputs a plaintext with a known length (e.g. a byte array instead of a stream) as well as inputs a (possible incorrect) maxPlaintextLength?

We should avoid letting users do this as much as is reasonable for the implementation. If for a certain API the supplied plaintext's length will always be known, that API shouldn't provide maxPlaintextLength as an input. As such, we should add the following to the spec:

If exposing an API where the plaintext length is always known from the input plaintext, then the ESDK MUST NOT provide maxPlaintextLength as a user input.
If exposing an API where it is possible for the length of the plaintext to be unknown on input, then the ESDK MUST provide an optional maxPlaintextLength input.

To make clear the behavior in the latter case, we should add the following to the spec:

If maxPlaintextLength is supplied on input and the input plaintext has a total length greater than maxPlaintextLength, this operation MUST fail.
If the length of the plaintext is known, the Encrypt operation MUST pass the real plaintext length as maxPlaintextLength in the CMM GetEncryptionMaterials call. (regardless of whether maxPlaintextLength was supplied on input),
If the length of the plaintext is not known and maxPlaintextLength was supplied on input, the Encrypt operation MUST pass the supplied maxPlaintextLength to the CMM GetEncryptionMaterials call.
If the length of the plaintext is not known and maxPlaintextLength was not supplied on input, the Encrypt operation MUST NOT specify a maxPlaintextLength in the CMM GetEncryptionMaterials call.

Why not pass always pass the user supplied `maxPlaintextLength` to the CMM?

This passes less accurate information to the CMM. There is no case where a user would benefit from passing a less accurate maxPlaintextLength to the CMM. If there were such a case, then such a CMM is not correctly using the intent behind maxPlaintextLength, and it is better to restrict flexibilty here so that users have less opportunity to depends on bad behaviors from a poorly designed CMM.

Why not standardize on a name?

Most implementations already have this control with almost the same exact behavior, but with different names. It is not worth it to break customers to update the name of this control in every implementation, without actually changing the core behavior. We should define this control in the spec as maxPlaintextLength for the Encrypt API and the CMM Interface, and we can document what each implementation names this control.

What changes are needed in implementations to match this specification?

Java: None. Java one-shot APIs provide no "plaintext input" control, as expected. Java streaming APIs only take in Java InputStreams and OutputStreams as input, and users can set a maxPlaintextLength() on the Java stream in order to replicate the above behavior.
Python: Needs to be updated to pass the real plaintext length if known to the CMM. Python already fails if source_length < len(source).
JS: Needs to be updated to fail if plaintextLength < plaintext.byteLength. JS already passes the real length if known to the CMM.
C: None. C provides a set_max_message_size() on the session that sets a size_bound that behaves according to the above behavior.

awslabs / aws-encryption-sdk-specification