dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.96k stars 4.65k forks source link

[API Proposal]: System.Reflection.Metadata.MetadataReader.GetDocId(EntityHandle) #103168

Open AArnott opened 3 months ago

AArnott commented 3 months ago

Background and motivation

Several tools beyond compilers require the ability to construct "doc IDs" (documentation comments that unambiguously reference a particular API).

Constructing these is non-trivial. I recently created a DocID construction class based on System.Reflection.Metadata, and @terrajobst suggested I contribute it back to the SRM library (or some other that sits on top of it).

API Proposal

namespace System.ReflectionMetadata;

public partial class MetadataReader
{
    /// <summary>
    /// Constructs a <see href="https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/language-specification/documentation-comments#d42-id-string-format">DocID</see> for the given entity handle.
    /// </summary>
    /// <param name="entityHandle">The handle to the entity to construct a DocID for.</param>
    /// <returns>The DocID.</returns>
    /// <exception cref="NotSupportedException">Thrown when <paramref name="entityHandle"/> refers to an entity for which no DocID can be constructed.</exception>
    /// <remarks>
    /// <para>
    /// DocIDs can be constructed for the following entity types:
    /// <list type="bullet">
    /// <item><see cref="HandleKind.TypeDefinition"/></item>
    /// <item><see cref="HandleKind.EventDefinition"/></item>
    /// <item><see cref="HandleKind.FieldDefinition"/></item>
    /// <item><see cref="HandleKind.MethodDefinition"/></item>
    /// <item><see cref="HandleKind.PropertyDefinition"/></item>
    /// <item><see cref="HandleKind.TypeReference"/></item>
    /// <item><see cref="HandleKind.MemberReference"/></item>
    /// </list>
    /// </para>
    /// </remarks>
    public string GetDocId(EntityHandle entityHandle);
}

API Usage

using FileStream assemblyStream = File.OpenRead(Assembly.GetExecutingAssembly().Location);
using PEReader peReader = new(this.assemblyStream);
MetadataReader reader = peReader.GetMetadataReader();

foreach (TypeDefinitionHandle tdh in reader.TypeDefinitions)
{
    string docId = reader.GetDocId(tdh);
}

Alternative Designs

Instead of adding GetDocId directly to MetadataReader, when it isn't really about reading metadata per se, we could add the method to a new class. The new class would have just one method on it however, and it would have to be constructed with a reference to a MetadataReader.

Yet another option is to put this functionality in another library.

Risks

The code I have was ported from the Upgrade Assistant (which used Mono.Cecil instead of SRM). I didn't study the original spec for DocIDs closely, so the code I have may not be 100% compliant with the spec.

It does come with a bunch of tests though, which also came from Upgrade Assistant.

When producing DocIDs for MemberReference, we can't be sure whether to construct a property (P:) or event (E:) style reference vs. a standard method reference (M:) when the reference is to an API in another assembly because we don't have access to the other assembly. We can merely guess based on prefix naming conventions.

dotnet-policy-service[bot] commented 3 months ago

Tagging subscribers to this area: @dotnet/area-system-reflection-metadata See info in area-owners.md if you want to be subscribed.

ericstj commented 2 months ago

Making DocID more widely available is a good thing. The nice thing about putting it in SRM is that it's already a nuget package (as opposed to reflection or some new package).

I see Roslyn already has https://learn.microsoft.com/en-us/dotnet/api/microsoft.codeanalysis.isymbol.getdocumentationcommentid?view=roslyn-dotnet-4.7.0 but I imagine you need this for a tool that doesn't already use Roslyn?

I wonder if there are more scenarios for type name rendering from metadata that should be looked at as part of this.

AArnott commented 2 months ago

Ya, roslyn is a huge dependency to take just for DocID creation. And it isn't just the dll sizes, it's that I'd have to switch to the much more GC heavy roslyn API instead of the nearly alloc-free SRM object model.

That said, I need it for tools in which I've already got the code (code that I would contribute to SRM if accepted). So I'm not blocked by this.

terrajobst commented 2 months ago

The big problem is that Roslyn requires the closure of dependencies to crack open metadata reliably. That's often very expensive as it disallows looking at binaries in isolation (such as NuGet packages) and requires the caller to somehow resolve all of those, including the framework binaries.

For a lot of the analyses we're doing we're content with just dumping the type ref / member ref table as dependencies without even having to parse any of the IL bodies.

And that's where having this API would be incredibly convenient. I would absolutely love having this API and would use it virtually everywhere where I'm still using CCI, which is unmaintained. However, it's the only MR that can you look at binaries without forcing resolution.