dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.95k stars 4.65k forks source link

[API Proposal]: ArrayRecord.TotalElementsCount #106644

Open adamsitnik opened 3 weeks ago

adamsitnik commented 3 weeks ago

Background and motivation

NRBF format exposes a possibility to represent multiple nulls with a single record (we need one byte for its type and another four bytes for the null count). It allows the attackers to send a small payload that represents a very large array. For example, serialized representation of following jagged array is just 90 bytes.

string[3][] jagged = new string[3][];
jagged[0] = new string[Array.MaxLength];
jagged[1] = new string[Array.MaxLength];
jagged[2] = new string[Array.MaxLength];

using MemoryStream payload = new();
new BinaryFormatter().Serialize(payload, jagged);
Assert.Equal(90, payload.Length);

while it takes more than 50 GB to instantiate it.

ArrayRecord arrayRecord = (ArrayRecord)NrbfDecoder.Decode(payload);
string[][] jagged = (string[][])arrayRecord.GetArray(expectedArrayType: typeof(string[][])); // allocates few dozens GB of memory

During the Threat Model session, it turned out that we currently don’t expose any API that could allow the users to avoid getting into this particular trap.

API Proposal

Providing TotalElementsCount is going to allow the users to perform such checks (and also avoid the need to compute it on their own for multi-dimensional arrays). By elements I mean the T values stored by the inner most arrays.

namespace System.Formats.Nrbf;

public abstract partial class ArrayRecord : SerializationRecord
{
    public abstract System.ReadOnlySpan<int> Lengths { get; }
+   public virtual long TotalElementsCount { get; }
}

API Usage

ArrayRecord arrayRecord = (ArrayRecord)NrbfDecoder.Decode(payload);
if (arrayRecord.TotalElementsCount > 100_000)
{
    throw new Exception($"Provided record had too many elements: {arrayRecord.TotalElementsCount}");
}
// from now we know that GetArray won't allocate more space than required for storing 100_000 elements.
string[][] jagged = (string[][])arrayRecord.GetArray(expectedArrayType: typeof(string[][]));

Alternative Designs

It's not obvious what total element count means for jagged arrays of jagged arrays. To avoid that, we could provide an API that would return the total allocated bytes.

public long GetTotalAllocatedBytes();

The API would need to take references into account (for every array record, GetArray allocates the array only once). In following example:

byte[] referenced = new byte[Array.MaxLength];
byte[][] jagged = new byte[3][];
jagged[0] = referenced;
jagged[1] = referenced;
jagged[2] = referenced;

The allocated bytes would be 2 GB (rather than 6 GB). Would that be a problem for users who would like to use the new API as initial guard and then process it later without any checks?

We don't provide any similar API in the BCL and I suspect that it would be hard to make it always return 100% exact value: we would need to take many runtime implementation details into account: object headers, method table pointers, alignment. But we could also just document that the API does not take these extra fields into account and returns estimated value. But would that make us vulnerable to attack where the payload contains MANY contained arrays with just 1 element?

Risks

No response

Other

https://github.com/dotnet/runtime/pull/106629 contains a working impl

bartonjs commented 3 weeks ago

Video

namespace System.Formats.Nrbf;

public abstract partial class ArrayRecord : SerializationRecord
{
    public virtual long FlattenedLength { get; }
}
dotnet-policy-service[bot] commented 3 weeks ago

Tagging subscribers to 'binaryformatter-migration': @adamsitnik, @bartonjs, @jeffhandley, @terrajobst