CommunityToolkit / dotnet

.NET Community Toolkit is a collection of helpers and APIs that work for all .NET developers and are agnostic of any specific UI platform. The toolkit is maintained and published by Microsoft, and part of the .NET Foundation.
https://docs.microsoft.com/dotnet/communitytoolkit/?WT.mc_id=dotnet-0000-bramin
Other
2.94k stars 291 forks source link

[HighPerformance] CommunityToolkit.HighPerformance.Buffers.LargeArray<T> #702

Open DaZombieKiller opened 1 year ago

DaZombieKiller commented 1 year ago

Overview

Today, .NET does not support array dimensions larger than 0x7FFFFFC7. However, it is possible to implement a LargeArray<T> type by increasing the size of the array element (for example, an array of T, T tuples instead of an array of Ts) or allocating a multi-dimensional array.

API breakdown

using System;
using System.Collections;
using System.Collections.Generic;
using System.ComponentModel;

namespace CommunityToolkit.HighPerformance.Buffers;

public abstract class LargeArray
{
    public nuint Length { get; }

    // Mimic the APIs on System.GC
    public static LargeArray<T> Allocate<T>(nuint length, bool pinned = false);
    public static LargeArray<T> AllocateUninitialized<T>(nuint length, bool pinned = false);

    // Only LargeArray<T> should inherit from this type
    private protected LargeArray();

    // Intended to aid serializers and reflection use cases
    public abstract Type GetElementType(); // Implemented as 'return typeof(T)' on LargeArray<T>
    public abstract object? GetValue(nuint index);
    public abstract void SetValue(object? value, nuint index);

    // Returns 'ref null' for an empty array
    [EditorBrowsable(EditorBrowsableState.Never)]
    public ref byte GetPinnableReference();

    // LargeArrayMarshal.TryGetArray, InvalidCastException on failure
    public static explicit operator Array(LargeArray array);
}

public sealed class LargeArray<T> : LargeArray
{
    public LargeArray(nuint length);
    public Span<T> AsSpan(nuint start, int length);
    public Memory<T> AsMemory(nuint start, int length);
    public SegmentEnumerator EnumerateSegments();
    public override Type GetElementType();

    // Hidden to discourage use over the indexers when you have a concrete LargeArray<T>
    [EditorBrowsable(EditorBrowsableState.Never)]
    public override object? GetValue(nuint index);

    [EditorBrowsable(EditorBrowsableState.Never)]
    public override void SetValue(object? value, nuint index);

    // Throws 'IndexOutOfRangeException' for bad indices
    public ref T this[nint index] { get; }
    public ref T this[nuint index] { get; }

    // Returns 'ref null' for an empty array
    [EditorBrowsable(EditorBrowsableState.Never)]
    public new ref T GetPinnableReference();

    // LargeArrayMarshal.TryGetArray, InvalidCastException on failure
    public static explicit operator T[](LargeArray<T> array);
    public static explicit operator Array(LargeArray<T> array);

    public struct SegmentEnumerator
    {
        public LargeArraySegment<T> Current { get; }
        public bool MoveNext();
        public SegmentEnumerator GetEnumerator();
    }
}

// Represents at most int.MaxValue elements after 'Offset'
public readonly struct LargeArraySegment<T>
{
    public LargeArray<T> Array { get; }
    public nuint Offset { get; }
    public LargeArraySegment(LargeArray<T> array, nuint offset);

    // Array.AsSpan(Offset, (int)nuint.Min(int.MaxValue, Array.Length - Offset))
    public Span<T> Span { get; }

    // Included to enable 'foreach ((var offset, var span) in array.EnumerateSegments())'
    public void Deconstruct(out nuint offset, out Span<T> span);

    // Included to enable 'foreach (Span<T> span in array.EnumerateSegments())'
    public static explicit operator Span<T>(LargeArraySegment<T> segment);
    public static explicit operator ReadOnlySpan<T>(LargeArraySegment<T> segment);
}

public static class LargeArrayMarshal
{
    public static ref byte GetLargeArrayDataReference(LargeArray array);
    public static ref T GetLargeArrayDataReference<T>(LargeArray<T> array);
    public static LargeArray<T> CreateLargeArray(Array buffer, nuint length);

    // Returns the underlying storage array. This is not necessarily a T[].
    public static Array GetBuffer(LargeArray array);

    // If the LargeArray<T> is backed by a regular T[] that matches in length, this will succeed.
    public static bool TryGetArray(LargeArray array, [NotNullWhen(true)] out Array? result);

    // If the LargeArray<T> is backed by a regular T[] that matches in length, this will succeed.
    public static bool TryGetArray<T>(LargeArray<T> array, [NotNullWhen(true)] out T[]? result);
}

public static class LargeArrayExtensions
{
    // Safe version of 'LargeArrayMarshal.CreateLargeArray(array, (uint)array.Length)'
    public static LargeArray<T> AsLargeArray<T>(this T[] array);

    // Implemented using 'foreach ((var offset, var span) in array.EnumerateSegments())'
    public static nint IndexOf<T>(this LargeArray<T> array, T value) where T : IEquatable<T?>;
    public static nint IndexOf<T>(this LargeArray<T> array, ReadOnlySpan<T> value) where T : IEquatable<T?>;
    public static nint LastIndexOf<T>(this LargeArray<T> array, T value) where T : IEquatable<T?>;
    public static nint LastIndexOf<T>(this LargeArray<T> array, ReadOnlySpan<T> value) where T : IEquatable<T?>;
    // ...TBD
}

Usage example

var array = new LargeArray<int>(someLargeLength);
array[hugeIndex] = 50;
var array = new LargeArray<int>(someLargeLength);
var memory = array.AsMemory(hugeIndex, readCount);
await reader.ReadAsync(memory, cancellationToken);

Breaking change?

No

API design questions

Additional context

A proof-of-concept implementation of LargeArray<T> by myself and @hamarb123 can be found here.

Help us help you

Yes, but only if others can assist

ruilvo commented 1 year ago

I would like to +1 the proposal and refer to the discussion in https://github.com/dotnet/runtime/issues/12221 for some motivation.

DaZombieKiller commented 1 year ago

I'd like to add that creating a separate LargeArray<T> instead of resolving https://github.com/dotnet/runtime/issues/12221 only helps with cases directly affected by Array limitations, instead of improving all other collections and interfaces using arrays as their underlying storage and/or exposing 32-bit indexers and lengths((I)(ReadOnly)List/(Hash)Set/Dictionary<T>) at once.

Solving the large storage problem for the existing List<T> etc is a non-goal for this proposal, this is about experimenting with a building block upon which types like LargeList<T>, LargeSet<T>, etc. could be built with the functionality available in the runtime today. Large storage is a pretty specialized scenario where separate "large" collections would most likely be better suited anyway, because they can optimize for larger sets of data (which is going to be a very rare case for the standard collections).

hez2010 commented 9 months ago

Would instead like to call it NativeArray as the length matches the size of the native pointer.

saucecontrol commented 9 months ago

To me, NativeArray reads like it's backed by native memory, not that its length is limited to native int max 🤷‍♂️