dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.35k stars 4.74k forks source link

Reading all bytes from CryptoStream does not trigger TransformFinalBlock if stream length is a multiple of transform block size #76948

Closed sebwills closed 1 year ago

sebwills commented 2 years ago

Description

At the end of the source stream, a CryptoStream in Read mode is expected to call the TransformFinalBlock method of the supplied ICryptoTransform to handle any partial final block and allow the ICryptoTransform do any final action required (e.g. produce a hash value).

If the code consuming the CryptoStream knows how many bytes to expect in the 'decrypted' stream (e.g. if the format of the data indicates its length), it may stop reading and dispose the CryptoStream once it has read all the bytes in the stream, without ever doing a Read operation that returns a zero count. For example, some GZip stream readers do this.

Please note, I am not just referring to Read returning less than its count parameter (which I know is normal in dotnet 6), I am talking about the total number of bytes read reaching the known total length of the stream.

In this scenario, CryptoStream's behaviour depends on whether the source stream length is a multiple of the ICryptoTransform's InputBlockSize:

Reproduction Steps

Here is a repro, using HashAlgorithm as the ICryptoTransform, which will throw if you try to read the hash value without calling TransformFinalBlock. Because HashAlgorithm's input block size is 1, this repro works with any length stream.

using System.Security.Cryptography;

var inputLength = 10; // Any multiple of the ICryptoTransform's InputBlockSize, which is 1 in the cash of HashAlgorithm
var readBlockSize = 64;

var source = PrepareSourceStream(inputLength);
Console.WriteLine($"Source length is {source.Length}");

using var hasher = SHA256.Create();
using (var cs = new CryptoStream(source, hasher, CryptoStreamMode.Read))
{
    var buffer = new byte[readBlockSize];
    int totalRead = 0, read = 0;
    // read until either we receive count of zero, or we have read inputLength bytes
    do {
        read = cs.Read(buffer, 0, buffer.Length);
        totalRead += read;
        Console.WriteLine($"Read {read} bytes (max requested {buffer.Length}), total {totalRead}");
    } while (read != 0 && totalRead < inputLength);

    // Note: an additional read here (which would return zero count) would avoid the issue,
    // but the point is that we have in fact read the whole stream, so arguably shouldn't have to
    // perform a further read attempt.
    //Console.WriteLine("Performing extra zero-byte read");
    //cs.Read(buffer, 0, 0);
}

// Next line throws System.Security.Cryptography.CryptographicUnexpectedOperationException: "Hash must be finalized before the hash value is retrieved."
// unless we perform an additional read after reading all the bytes.
Console.WriteLine($"Retrieved hash {hasher.Hash}");

MemoryStream PrepareSourceStream(int length) {
    var stream = new MemoryStream(length);
    for (var i = 0; i < length; i++)
    {
        stream.WriteByte(0);
    }
    stream.Seek(0, SeekOrigin.Begin);
    return stream;
}

Expected behavior

On Closing a CryptoStream in Read mode, when all the output stream's bytes have been read, but without Read having returned zero, it should invoke TransformFinalBlock on the ICryptoTransform it was constructed with.

In terms of the above repro code, the code should not throw (but the fault is with CryptoStream, not HashAlgorithm).

If the behaviour demonstrated here is felt to be acceptable, perhaps the CryptoStream documentation should highlight that it is essential to read the entire input stream, including a final read that returns zero count, in order to correctly interact with the supplied ICryptoTransform.

Actual behavior

Running the above repro produces the output

Source length is 10
Read 10 bytes (max requested 64), total 10
Unhandled exception. System.Security.Cryptography.CryptographicUnexpectedOperationException: Hash must be finalized before the hash value is retrieved.
   at System.Security.Cryptography.HashAlgorithm.get_Hash()
   at Program.<Main>$(String[] args) in ...../Program.cs:line 28

i.e. HashAlgorithm throws when we access Hash because its TransformFinalBlock method was never called by the CryptoStream.

Regression?

No response

Known Workarounds

The code consuming the CryptoStream's output can make a further call to the CryptoStream's Read method (even with the count parameter zero), even though it knows the result will be zero. This will trigger CryptoStream to call TransformFinalBlock (with zero length buffer), as it should.

Configuration

dotnet 6.0.401 Seen on Mac and Windows, but not known to be architecture specific.

Other information

No response

ghost commented 2 years ago

Tagging subscribers to this area: @dotnet/area-system-security, @vcsjones See info in area-owners.md if you want to be subscribed.

Issue Details
### Description At the end of the source stream, a `CryptoStream` in Read mode is expected to call the `TransformFinalBlock` method of the supplied `ICryptoTransform` to handle any partial final block and allow the `ICryptoTransform` do any final action required (e.g. produce a hash value). If the code consuming the `CryptoStream` knows how many bytes to expect in the 'decrypted' stream (e.g. if the format of the data indicates its length), it may stop reading and dispose the `CryptoStream` once it has read all the bytes in the stream, without ever doing a `Read` operation that returns a zero count. For example, some GZip stream readers do this. Please note, I am not just referring to `Read` returning less than its `count` parameter (which I know is normal in dotnet 6), I am talking about the total number of bytes read reaching the known total length of the stream. In this scenario, `CryptoStream`'s behaviour depends on whether the source stream length is a multiple of the `ICryptoTransform`'s `InputBlockSize`: - if the stream length _isn't_ a multiple of `ICryptoTransform.InputBlockSize`, `CryptoStream` will have internally tried to read beyond the end of the encryptedStream, i.e. will have peformed a Read that returned zero, and will have called `TransformFinalBlock` on the partial buffer. This is as expected. - if the stream length _is_ a multiple of `ICryptoTransform`.`InputBlockSize`, `CryptoStream` will have an empty internal buffer and will not know that it has reached the end of the encrypted stream, and will not call `TransformFinalBlock`, even when it is closed. As a result the `ICryptoTransform` does not know it has seen all the data and cannot do any final operations such as verifying the stream's integrity or making a hash value available. This feels like unexpected behaviour - particularly the way the behaviour depends on whether or not the input stream length happens to be a multiple of the transform's buffer size. ### Reproduction Steps Here is a repro, using HashAlgorithm as the `ICryptoTransform`, which will throw if you try to read the hash value without calling `TransformFinalBlock`. Because HashAlgorithm's input block size is 1, this repro works with any length stream. ```csharp using System.Security.Cryptography; var inputLength = 10; // Any multiple of the `ICryptoTransform`'s `InputBlockSize`, which is 1 in the cash of HashAlgorithm var readBlockSize = 64; var source = PrepareSourceStream(inputLength); Console.WriteLine($"Source length is {source.Length}"); using var hasher = SHA256.Create(); using (var cs = new `CryptoStream`(source, hasher, `CryptoStream`Mode.Read)) { var buffer = new byte[readBlockSize]; int totalRead = 0, read = 0; // read until either we receive count of zero, or we have read inputLength bytes do { read = cs.Read(buffer, 0, buffer.Length); totalRead += read; Console.WriteLine($"Read {read} bytes (max requested {buffer.Length}), total {totalRead}"); } while (read != 0 && totalRead < inputLength); // Note: an additional read here (which would return zero count) would avoid the issue, // but the point is that we have in fact read the whole stream, so arguably shouldn't have to // perform a further read attempt. //Console.WriteLine("Performing extra zero-byte read"); //cs.Read(buffer, 0, 0); } // Next line throws System.Security.Cryptography.CryptographicUnexpectedOperationException: "Hash must be finalized before the hash value is retrieved." // unless we perform an additional read after reading all the bytes. Console.WriteLine($"Retrieved hash {hasher.Hash}"); MemoryStream PrepareSourceStream(int length) { var stream = new MemoryStream(length); for (var i = 0; i < length; i++) { stream.WriteByte(0); } stream.Seek(0, SeekOrigin.Begin); return stream; } ``` ### Expected behavior On Closing a `CryptoStream` in `Read` mode, when all the output stream's bytes have been read, but without `Read` having returned zero, it should invoke `TransformFinalBlock` on the `ICryptoTransform` it was constructed with. In terms of the above repro code, the code should not throw (but the fault is with `CryptoStream`, not `HashAlgorithm`). If the behaviour demonstrated here is felt to be acceptable, perhaps the `CryptoStream` documentation should highlight that it is essential to read the entire input stream, _including a final read that returns zero count_, in order to correctly interact with the supplied `ICryptoTransform`. ### Actual behavior Running the above repro produces the output ``` Source length is 10 Read 10 bytes (max requested 64), total 10 Unhandled exception. System.Security.Cryptography.CryptographicUnexpectedOperationException: Hash must be finalized before the hash value is retrieved. at System.Security.Cryptography.HashAlgorithm.get_Hash() at Program.
$(String[] args) in ...../Program.cs:line 28 ``` i.e. `HashAlgorithm` throws when we access Hash because its `TransformFinalBlock` method was never called by the `CryptoStream`. ### Regression? _No response_ ### Known Workarounds The code consuming the CryptoStream's output can make a further call to the `CryptoStream`'s `Read` method (even with the `count` parameter zero), even though it knows the result will be zero. This will trigger `CryptoStream` to call TransformFinalBlock (with zero length buffer), as it should. ### Configuration dotnet 6.0.401 Seen on Mac and Windows, but not known to be architecture specific. ### Other information _No response_
Author: sebwills
Assignees: -
Labels: `area-System.Security`
Milestone: -
stephentoub commented 2 years ago

Regression? No response

With a slight tweak to the repro, it's the same behavior on .NET Framework 4.8, i.e. if you change readBlockSize to also be 10 to match the expected output length, then this also throws an exception on .NET Framework. That's because CryptoStream never tried to read beyond the amount of data requested. The cited repro only repros on .NET 6 and not .NET Framework 4.8 because of the mentioned partial read change, where on .NET Framework because readBlockSize is larger than the resulting data, CryptoStream would aggressively try to fill the buffer. But if the buffer you give it is such that it doesn't discover it's at EOF, then it won't do the final transform.

vcsjones commented 2 years ago

I think this is a known issue that was previously reported here https://github.com/dotnet/runtime/issues/2042 and documented as a breaking change at https://learn.microsoft.com/en-us/dotnet/core/compatibility/3.0#cryptostreamdispose-transforms-final-block-only-when-writing.

The linked issue suggests that you can add something like this before accessing the Hash property:

hasher.TransformFinalBlock(Array.Empty<byte>(), 0, 0);
Console.WriteLine($"Retrieved hash {hasher.Hash}");

Or the work around noted in the original issue.


Note that if all you are trying to do is hash a stream, you don't need to use CryptoStream to do that. It could be as simple as:

using var hasher = SHA256.Create();
byte[] hash = hasher.ComputeHash(source);
Console.WriteLine($"Retrieved hash {Convert.ToBase64String(hasher.Hash)}");

Or in .NET 7, with a static one-shot:

var hash = SHA256.HashData(source);
Console.WriteLine($"Retrieved hash {Convert.ToBase64String(hash)}");
sebwills commented 2 years ago

Thanks for the responses.

It's helpful to see this behaviour partly documented

In .NET Core 3.0 and later versions, Dispose no longer tries to transform the final block when reading, which allows for incomplete reads.

I say partly because it's arguable that the repro I cited is not exactly an "incomplete read" since the whole stream has been read. I would still ideally like to see the CryptoStream docs themselves acknowledge the fact that a reader must attempt to read beyond EOF (getting a zero-byte result) if they want TransformFinalBlock to be called.

Thanks @vcsjones for the hashing hints. I used HashAlgorithm just for the repro. The original issue I discovered was with a chain of CryptoStreams doing decryption, hashing and recording complete/incomplete decrypts, which crashed one in 16 times on .net6 (i.e. when processing a stream that happened to have a length that was a multiple of the 16 block size in use).

How about changing CryptoStream so that, on Close, if it hasn't already hit EOF, it attempts another Read, which will result in a call to TransformFinalBlock if it gets a zero length back. Nobody can really complain about the side effect of potentially reading extra bytes from the source stream which are then discarded, since CryptoStream routinely reads ahead. This would make the behaviour more intuitive, and not have the quirky dependence on input stream length which can result in bugs not detected in testing.

bartonjs commented 2 years ago

How about changing CryptoStream so that, on Close, if it hasn't already hit EOF, it attempts another Read, which will result in a call to TransformFinalBlock if it gets a zero length back.

This suggestion only really makes sense for degenerate transformations like a hash algorithm. For block-padded encryption/decryption (the 99% use case) the call to Read which signals EOF from the underlying stream will either a) (decrypt) check that the still-buffered data is a multiple of the block size (throwing if not), decrypt it, and remove the padding that was added during decryption (throwing if it's invalid, and leaving it if it's an un-removable algorithm); or b) (encrypt) apply the selected padding algorithm and encrypt the final block. That is, the expected case of TransformFinalBlock is that it does work and produces more bytes. The ICryptoTransform instances from HashAlgorithm instances are degenerate in that they don't. They're further degenerate in that you're expected to look at side-effect state on the instance afterwards (the Hash property), which is not anything done by any other platform-provided ICryptoTransform.

We... could... add something like if (stream.CanSeek && stream.Position == stream.Length) as a followon check in Read(), but that doesn't really feel valuable to me because it's special-casing just for HashAlgorithm and, of course, doesn't help non-seekable streams that happen to know their length.

My recommendation here would be to stop using CryptoStream altogether, and just use the original stream in conjunction with IncrementalHash.