dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.55k stars 4.54k forks source link

[API Proposal]: Embrace spans in System.Reflection.Metadata. #85280

Open teo-tsirpanis opened 1 year ago

teo-tsirpanis commented 1 year ago

Background and motivation

System.Reflection.Metadata is a pretty perormance-oriented library but its API lacks methods that accept spans. I propose additional span overloads for methods that work with memory buffers. Some of these APIs can now be implemented efficiently thanks to the work started in #81059.

API Proposal

namespace System.Reflection.Metadata;

public class BlobBuilder
{
    public void WriteBytes(ReadOnlySpan<byte> buffer);
}

public class BlobContentId
{
    public BlobContentId(ReadOnlySpan<byte> id);
    public static BllobContentId FromHash(ReadOnlySpan<byte> hashCode);
}

public struct BlobReader
{
    public void ReadBytes(Span<byte> buffer);
    public void ReadUTF8(Span<char> buffer);
    public bool TryReadUTF8(Span<char> buffer, out int charsWritten);
    public void ReadUTF16(Span<char> buffer);
    public bool TryReadUTF16(Span<char> buffer, out int charsWritten);
}

public struct BlobWriter
{
    public void WriteBytes(ReadOnlySpan<byte> buffer);
}

public struct MetadataReader
{
    public int GetBlobLength(BlobHandle handle);
    public void GetBlobBytes(BlobHandle handle, Span<byte> buffer, out int bytesWritten);
    public int GetStringLength(StringHandle handle);
    public void GetStringChars(StringHandle handle, Span<char> buffer, out int charsWritten);
    public int GetStringLength(NamespaceDefinitionHandle handle);
    public void GetStringChars(NamespaceDefinitionHandle handle, Span<char> buffer, out int charsWritten);
    public int GetStringLength(DocumentNameBlobHandle handle);
    public void GetStringChars(DocumentNameBlobHandle handle, Span<char> buffer, out int charsWritten);
    public int GetUserStringLength(UserStringHandle handle);
    public void GetUserStringChars(UserStringHandle handle, Span<char> buffer, out int charsWritten);

    // Or what about we add the following APIs instead of the above?
    // The memory the span points to can be freed under our feet but
    // this is an existing problem with many SRM APIs that safely wrap pointers.
    // The library is expert-level either way.
    public ReadOnlySpan<byte> GetBlobSpan(BlobHandle handle);
    public ReadOnlySpan<byte> GetRawStringBytes(StringHandle handle);
    public ReadOnlySpan<byte> GetRawStringBytes(NamespaceDefinitionHandle handle);
    public ReadOnlySpan<byte> GetRawStringBytes(DocumentNameBlobHandle handle);
    public ReadOnlySpan<byte> GetRawUserStringBytes(UserStringHandle handle);
}

namespace System.Reflection.Metadata.Ecma335;

public sealed class MetadataBuilder
{
    // These APIs will not allocate if the blob/string already exists.
    public BlobHandle GetOrAddBlob(ReadOnlySpan<byte> value);
    public BlobHandle GetOrAddBlobUTF8(ReadOnlySpan<char> value, bool allowUnpairedSurrogates = true);
    public BlobHandle GetOrAddBlobUTF16(ReadOnlySpan<char> value);
    public BlobHandle GetOrAddDocumentName(ReadOnlySpan<char> value);
    public StringHandle GetOrAddString(ReadOnlySpan<char> value);
    public UserStringHandle GetOrAddUserString(ReadOnlySpan<char> value);
}

API Usage

The proposed APIs correspond to existing APIs that work with strings and (immutable) byte arrays. Their usage will be similar.

Alternative Designs

Do nothing and either let users implement the span APIs on top of pointers if they are available, or accept the memory allocations if they aren't.

There are also more possible APIs to be spanified (such as methods in the System.Reflection.Metadata.Ecma335.***Encoders), but the proposed APIs are the most foundational; the rest can be implemented by user code on top of these with little effort.

Risks

No response

ghost commented 1 year ago

Tagging subscribers to this area: @dotnet/area-system-reflection-metadata See info in area-owners.md if you want to be subscribed.

Issue Details
### Background and motivation `System.Reflection.Metadata` is a pretty perormance-oriented library but its API lacks methods that accept spans. I propose additional span overloads for methods that work with memory buffers. Some of these APIs can now be implemented efficiently thanks to the work started in #81059. ### API Proposal ```csharp namespace System.Reflection.Metadata; public class BlobBuilder { public void WriteBytes(ReadOnlySpan buffer); } public class BlobContentId { public BlobContentId(ReadOnlySpan id); public static BllobContentId FromHash(ReadOnlySpan hashCode); } public struct BlobReader { public void ReadBytes(Span buffer); public void ReadUTF8(int byteCount, Span buffer); public bool TryReadUTF8(int byteCount, Span buffer, out int charsWritten); public void ReadUTF16(int byteCount, Span buffer); public bool TryReadUTF16(int byteCount, Span buffer, out int charsWritten); } public struct BlobWriter { public void WriteBytes(ReadOnlySpan buffer); } public struct MetadataReader { public int GetBlobLength(BlobHandle handle); public void GetBlobBytes(BlobHandle handle, Span buffer, out int bytesWritten); public int GetStringLength(StringHandle handle); public void GetStringBytes(StringHandle handle, Span buffer, out int bytesWritten); public int GetStringLength(NamespaceDefinitionHandle handle); public void GetStringBytes(NamespaceDefinitionHandle handle, Span buffer, out int bytesWritten); public int GetStringLength(DocumentNameBlobHandle handle); public void GetStringBytes(DocumentNameBlobHandle handle, Span buffer, out int bytesWritten); public int GetUserStringLength(UserStringHandle handle); public void GetUserStringBytes(UserStringHandle handle, Span buffer, out int bytesWritten); // Or what about we add the following APIs instead of the above? // The memory the span points to can be freed under our feet but // this is an existing problem with many SRM APIs that safely wrap pointers. // The library is expert-level either way. public ReadOnlySpan GetBlobSpan(BlobHandle handle); public ReadOnlySpan GetRawStringBytes(StringHandle handle); public ReadOnlySpan GetRawStringBytes(NamespaceDefinitionHandle handle); public ReadOnlySpan GetRawStringBytes(DocumentNameBlobHandle handle); public ReadOnlySpan GetRawUserStringBytes(UserStringHandle handle); } namespace System.Reflection.Metadata.Ecma335; public sealed class MetadataBuilder { // These APIs will not allocate if the blob/string already exists. public BlobHandle GetOrAddBlob(ReadOnlySpan value); public BlobHandle GetOrAddBlobUTF8(ReadOnlySpan value); public BlobHandle GetOrAddBlobUTF16(ReadOnlySpan value); public BlobHandle GetOrAddDocumentName(ReadOnlySpan value); public StringHandle GetOrAddString(ReadOnlySpan value); public UserStringHandle GetOrAddUserString(ReadOnlySpan value); } ``` ### API Usage The proposed APIs correspond to existing APIs that work with strings and (immutable) byte arrays. Their usage will be similar. ### Alternative Designs Do nothing and either let users implement the span APIs on top of pointers if they are available, or accept the memory allocations if they aren't. There are also more possible APIs to be spanified (such as methods in the `System.Reflection.Metadata.Ecma335.***Encoder`s), but the proposed APIs are the most foundational; the rest can be implemented by user code on top of these with little effort. ### Risks _No response_
Author: teo-tsirpanis
Assignees: -
Labels: `api-suggestion`, `area-System.Reflection.Metadata`
Milestone: -
steveharter commented 1 year ago

I propose additional span overloads for methods that work with memory buffers

@teo-tsirpanis can you provider an example of the caller using buffers (not backed by a simple array) where these APIs are advantageous? Thanks

teo-tsirpanis commented 1 year ago

One use case would be https://github.com/dotnet/runtime/pull/84580#discussion_r1169306517.

buyaa-n commented 1 year ago

@teo-tsirpanis the API proposals for BlobBuilder, BlobContentId, BlobWriter and MetadataBuilder looks good to me, for others more info needed, I left some questions as a comment. In general, a usage scenarios for them would be helpful for review. Also, I would prefer to split the proposals for BlobReader and MetadataReader into another issue if that makes sense to you

public struct BlobReader
{
    public byte[] ReadBytes(int byteCount)
    public void ReadBytes(int byteCount, byte[] buffer, int bufferOffset)
+   public void ReadBytes(Span<byte> buffer);
    public string ReadUTF8(int byteCount)
+   public void ReadUTF8(int byteCount, Span<char> buffer); // Why this passes byteCount but ReadBytes(Span<byte> buffer) not?
+   public bool TryReadUTF8(int byteCount, Span<char> buffer, out int charsWritten); // A uage scenarios will be helpful
    public string ReadUTF16(int byteCount)
+   public void ReadUTF16(int byteCount, Span<char> buffer);
+   public bool TryReadUTF16(int byteCount, Span<char> buffer, out int charsWritten); // Usage scenarios will be helpful
}
public struct MetadataReader
{
+   public int GetBlobLength(BlobHandle handle); // Why this needed? I guess for determining span size, anyway a user scenario will be useful
    public byte[] GetBlobBytes(BlobHandle handle);
+   public void GetBlobBytes(BlobHandle handle, Span<byte> buffer, out int bytesWritten); // why do you need bytesWritten?
    public string GetString(StringHandle handle);
+   public int GetStringLength(StringHandle handle); // usage scenarios
+   public void GetStringBytes(StringHandle handle, Span<char> buffer, out int bytesWritten);
+   public int GetStringLength(NamespaceDefinitionHandle handle);
+   public void GetStringBytes(NamespaceDefinitionHandle handle, Span<char> buffer, out int bytesWritten);
+   public int GetStringLength(DocumentNameBlobHandle handle);
+   public void GetStringBytes(DocumentNameBlobHandle handle, Span<char> buffer, out int bytesWritten);
+   public int GetUserStringLength(UserStringHandle handle);
+   public void GetUserStringBytes(UserStringHandle handle, Span<char> buffer, out int bytesWritten);

    // Or what about we add the following APIs instead of the above?
    // The memory the span points to can be freed under our feet but
    // this is an existing problem with many SRM APIs that safely wrap pointers.
    // The library is expert-level either way.
+    public ReadOnlySpan<byte> GetBlobSpan(BlobHandle handle);
+    public ReadOnlySpan<byte> GetRawStringBytes(StringHandle handle);
+    public ReadOnlySpan<byte> GetRawStringBytes(NamespaceDefinitionHandle handle);
+    public ReadOnlySpan<byte> GetRawStringBytes(DocumentNameBlobHandle handle);
+    public ReadOnlySpan<byte> GetRawUserStringBytes(UserStringHandle handle);
}
public sealed class MetadataBuilder
{
    public BlobHandle GetOrAddBlob(byte[] value) 
    // These APIs will not allocate if the blob/string already exists.
+   public BlobHandle GetOrAddBlob(ReadOnlySpan<byte> value);
    public BlobHandle GetOrAddBlobUTF8(string value, bool allowUnpairedSurrogates = true);
+   public BlobHandle GetOrAddBlobUTF8(ReadOnlySpan<byte> value); // should it have allowUnpairedSurrogates parameter too?
    public System.Reflection.Metadata.BlobHandle GetOrAddBlobUTF16(string value);
+   public BlobHandle GetOrAddBlobUTF16(ReadOnlySpan<byte> value);
    public BlobHandle GetOrAddDocumentName(string value);
+   public BlobHandle GetOrAddDocumentName(ReadOnlySpan<char> value);
    public StringHandle GetOrAddString(string value);
+   public StringHandle GetOrAddString(ReadOnlySpan<char> value);
    public UserStringHandle GetOrAddUserString(string value);
+   public UserStringHandle GetOrAddUserString(ReadOnlySpan<char> value);
}

Adding future milestone for now, I don't think all of these APIs ready for 8.0, we could change milestone if BlobReader and MetadataReader proposals moved to an another issue/proposal

fowl2 commented 8 months ago

Missing the simplest one :)

public sealed class MetadataReader
{
    public unsafe byte* MetadataPointer { get; }
    public int MetadataLength { get; }
+   public ReadOnlySpan<byte> MetadataSpan { get; }
}

Not sure if this is that useful given the whole type is unsafe, but perhaps:

public unsafe struct BlobReader
{
+   public BlobReader(ReadOnlySpan<byte> buffer);
}
PaulusParssinen commented 1 month ago

ILCompiler and its components could benefit from more (RO)Span<T> overloads in the S.R.M. In many places ILC has to go through various hoops to cache values that it reads from metadata because lack of modern Span APIs.

teo-tsirpanis commented 4 weeks ago

@buyaa-n

public int GetBlobLength(BlobHandle handle); // Why this needed? I guess for determining span size, anyway a user scenario will be useful

My thought was that it would allow the users to get the buffer's size and prepare an appropriately sized buffer to pass to GetBlob.

public void ReadUTF8(int byteCount, Span<char> buffer); // Why this passes byteCount but ReadBytes(Span<byte> buffer) not? public bool TryReadUTF8(int byteCount, Span<char> buffer, out int charsWritten); // A uage scenarios will be helpful

On second thought byteCount is not needed; it can be inferred from the span. I updated the proposal to fix this and other mistakes. I also am not sure if the BlobReader.TryRead*** methods are needed.

AArnott commented 3 weeks ago
public void GetStringChars(NamespaceDefinitionHandle handle, Span<char> buffer, out int bytesWritten);

The out parameter should be named charsWritten instead. For this, and all other char-based spans with out length parameters.

teo-tsirpanis commented 3 weeks ago

Updated, thanks.

AArnott commented 3 weeks ago

IMO I prefer the single TryGetString method pattern I proposed in #103169 over two methods (GetStringLength and GetStringChars). Besides being one method instead of two, it means I can reuse a buffer for many get-string calls without fear of it throwing an exception just because the length is too short. With the two method pattern, I must make an extra GetStringLength call before trying to obtain the string just to avoid the exception.