dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.27k stars 4.73k forks source link

[API Proposal]: CollectionsMarshal.GetKeys/ValuesAsReadOnlySpan #64187

Open polijp opened 2 years ago

polijp commented 2 years ago

Background and motivation

.net 6 introduced CollectionsMarshall to access a List elements as a Span or to access to a value of a dictionary with a ref struct. These features are really useful in mathematical computing that needs high performances.

The idea here consists in expanding CollectionsMarshall with SortedList that maintains two internal arrays (one for the keys, one for the values). To guarantee the sorted aspect, one will gets a ReadOnlySpan. It will allow to access the keys and/or the values and browse them quicker than with the Keys and Values Properties.

API Proposal

namespace System.Runtime.InteropServices
{
    public static class CollectionsMarshal
    {
        public ReadOnlySpan<TKey> GetKeysAsReadOnlySpan<TKey, TValue>(SortedList<TKey,TValue> collection);
        public ReadOnlySpan<TValue> GetValuesAsReadOnlySpan<TKey, TValue>(SortedList<TKey,TValue> collection);
    }
}

API Usage

// Browse the keys/values
var span = CollectionsMarshal.GetKeysAsReadOnlySpan(mycollection);
for(int i=0; i<span.Length;i++)
  Console.WriteLine(span[i]);

// Sums the values
SortedList<string, int> mycollection = new();
...
var values = CollectionsMarshal.GetValuesAsReadOnlySpan(mycollection);
int vec_count = Vector<int>.Count;
int sum = 0;

int i = 0;

// vectorized sum
for (; i <= values.Length - vec_count; i += vec_count)
    sum += Vector.Sum(new Vector<int>(values.Slice(i));

for (; i<values.Length ; i++)
    sum += values[i];

Console.WriteLine(sum);

Alternative Designs

No response

Risks

There is a risk when the collection is changed, either we have to refresh the length of the ReadOnlySpan that points on the keys (or the values), either we decide it keeps the length of the collection when it was created.

ghost commented 2 years ago

Tagging subscribers to this area: @dotnet/area-system-collections See info in area-owners.md if you want to be subscribed.

Issue Details
### Background and motivation .net 6 introduced CollectionsMarshall to access a List elements as a Span or to access to a value of a dictionary with a ref struct. These features are really useful in mathematical computing that needs high performances. The idea here consists in expanding CollectionsMarshall with SortedList that maintains two internal arrays (one for the keys, one for the values). To guarantee the sorted aspect, one will gets a ReadOnlySpan. It will allow to access the keys and/or the values and browse them quicker than with the Keys and Values Properties. ### API Proposal ```C# namespace System.Runtime.InteropServices { public static class CollectionsMarshal { public ReadOnlySpan GetKeysAsReadOnlySpan(SortedList collection); public ReadOnlySpan GetValuesAsReadOnlySpan(SortedList collection); } } ``` ### API Usage ```C# // Browse the keys/values var span = CollectionsMarshal.GetKeysAsReadOnlySpan(mycollection); for(int i=0; i mycollection = new(); ... var values = CollectionsMarshal.GetValuesAsReadOnlySpan(mycollection); int vec_count = Vector.Count; int sum = 0; int i = 0; // vectorized sum for (; i <= values.Length - vec_count; i += vec_count) sum += Vector.Sum(new Vector(values.Slice(i)); for (; i Author: polijp Assignees: - Labels: `api-suggestion`, `area-System.Collections`, `untriaged` Milestone: -
Joe4evr commented 2 years ago

There is a risk when the collection is changed, either we have to refresh the length of the ReadOnlySpan that points on the keys (or the values), either we decide it keeps the length of the collection when it was created.

It's not possible to change the length of a span instance after construction (just like arrays). Also, the same risk of contents changing already exists with AsSpan<T>(List<T>), you simply have to know what you're doing.

kasperk81 commented 2 years ago

[API Proposal]:

missing the title?

KalleOlaviNiemitalo commented 2 years ago

To guarantee the sorted aspect, one will gets a ReadOnlySpan.

The span of values wouldn't have to be read-only because the values won't affect sorting.

stephentoub commented 2 years ago

Note this wouldn't just be adding a new API... it would require moving SortedList<> from System.Collections.dll to System.Private.CoreLib.dll.

polijp commented 2 years ago

If moving SortedList<> is a big problem, can't we use [InternalsVisibleTo] and change the fields to internal?

stephentoub commented 2 years ago

can't we use [InternalsVisibleTo] and change the fields to internal

No, even if we were ok using InternalsVisibleTo (which we avoid), corelib can't reference collections.dll, it's the other way around.

stephentoub commented 2 years ago

Also, once we stop caring about BinaryFormatter and thus the impact of changes on serialization compat, we could very likely be interested in changing how data is stored in SortedList, but a method like this adds back such constraints.

I think such a method would need to yield very significant benefits to consider adding it, given what it costs in flexibility.

EgorBo commented 2 years ago

we could very likely be interested in changing how data is stored in SortedList, but a method like this adds back such constraints.

I guess it was ok for CollectionsMarshal.AsSpan for list which we would never change internally to some other data structure, but for dictionaries/sortinglist it may happen, e.g. some tree-based