[JIT&AOT] Allow ldsfld to load attribute-annotated static literal fields

dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.

https://docs.microsoft.com/dotnet/core/

MIT License

15.18k stars 4.72k forks source link

[JIT&AOT] Allow ldsfld to load attribute-annotated static literal fields #45694

Open verelpode opened 3 years ago

verelpode commented 3 years ago

What does everyone think of the idea of tweaking the JIT and AOT to allow the ldsfld instruction to load some fields marked with static literal? "Some fields" meaning ldsfld would only support a static literal field if it has at least one attribute applied to the field. For example:

class Example
{
    // ldsfld would support this field because it has at least one attribute applied:
    [System.Reflection.Obfuscation(Exclude = true)]
    const string kExcludedFromObfuscation = "PropertyXYZ";

    // ldsfld would support this field because it has at least one attribute applied:
    [FieldMutability(FieldMutability.ReflectedMemberName)]
    const string kNameOfPropertyXYZ = nameof(PropertyXYZ);

    // ldsfld would continue to reject this field, same as currently done:
    const string kSomethingElse = "abc";

    public int PropertyXYZ { get; set; }

    static string UseConstant()
    {
        return kExcludedFromObfuscation;
    }
}

Removing this limitation with ldsfld would be helpful for:

AOT compilers.
IL obfuscators.
IL decompilers such as ILSpy (quite useful for troubleshooting).

You can see in my above example that I marked kExcludedFromObfuscation as being excluded from obfuscation, which would be necessary because its value ("PropertyXYZ") is coincidentally identical to a member name. Smart/automatic obfuscators and AOT-compilers can get confused and make incorrect transformations when a constant string is coincidentally identical to a member name but in reality is unrelated to the member and should not be transformed or renamed.

Unfortunately the above usage of Obfuscation(Exclude = true) fails to have any effect, because the compiler loads the constant value using ldstr instead of ldsfld, because ldsfld doesn't allow static literal fields. Currently the above-shown UseConstant() method is compiled to:

ldstr "PropertyXYZ"
ret

instead of:

ldsfld string kExcludedFromObfuscation
ret

Thus tools such as IL obfuscators, IL decompilers, and IL-to-x86-64 AOT-compilers are unable to see that the UseConstant() method references the constant field named kExcludedFromObfuscation, and thus unable to use any attributes (such as ObfuscationAttribute) that were applied to the field. This problem would be solved if the JIT/AOT allowed ldsfld to load a static literal field when the field has at least one attribute applied to the field.

It's also useful in the opposite case: It would be helpful to use an attribute to explicitly mark strings produced by the C# nameof operator, such as:

[FieldMutability(FieldMutability.ReflectedMemberName)]
const string kNameOfPropertyXYZ = nameof(PropertyXYZ);

.NET 5.x could be given a new attribute named perhaps FieldMutabilityAttribute, and corresponding enum FieldMutability, like follows:

namespace System.Runtime.CompilerServices
{
    public enum FieldMutability : byte
    {
        /// <summary>Information about field mutability was not supplied or is unknown.  Default or automatic behavior may be used.</summary>
        Unspecified = 0,

        /// <summary>The field is/was/should NOT be marked with the C# "readonly" nor "const" keywords.  The field value is NOT a reflected type name or member name.</summary>
        Mutable,

        /// <summary>The field is/was/should be marked with the C# "readonly" keyword.  The field value is NOT a reflected type name or member name.</summary>
        Immutable,

        /// <summary>The field is/was/should be marked with the C# "const" keyword and should be optimized as a constant value.  The value is NOT a reflected type name or member name.  AOT-compilers and obfuscators must not transform this value, even if the value happens to be identical to the name of a type or member of a type.</summary>
        ConstantValue,

        /// <summary>This is equivalent to a C# property setter marked with the "init" keyword.  The field is read-only but its value can be set or supplied at the time of object construction/instantiation.</summary>
        InitializeOnly,

        /// <summary>The field value is constant, and it is the name of a type used in reflection or the C# "nameof" keyword.  The name is not fully-qualified (should not contain "." character).</summary>
        ReflectedTypeName,

        /// <summary>The field value is constant, and it is the fully-qualified name of a type used in reflection or the C# "nameof" keyword.  The name is fully-qualified thus it contains zero or more "." characters.</summary>
        ReflectedTypeFullName,

        /// <summary>The field value is constant, and it is the name of a member of a type used in reflection or the C# "nameof" keyword.  The name is not fully-qualified (should not contain "." character).</summary>
        ReflectedMemberName,

        /// <summary>The field value is constant, and it is the fully-qualified name of a member of a type used in reflection or the C# "nameof" keyword.  The name is fully-qualified thus it contains zero or more "." characters.</summary>
        ReflectedMemberFullName,

        /// <summary>The field value is constant, and it is a name used in reflection, other than the name of a type or member of a type.</summary>
        OtherReflectedName,
    }

    [AttributeUsage(AttributeTargets.Field, AllowMultiple = false)]
    public class FieldMutabilityAttribute : System.Attribute
    {
        public FieldMutabilityAttribute(FieldMutability inMutability)
        {
            this.Mutability = inMutability;
        }

        public FieldMutability Mutability { get; }

    } // class FieldMutabilityAttribute

} // namespace System.Runtime.CompilerServices

Even before Roslyn takes advantage of ldsfld lifting the restriction on static literal, people could start using the new attribute immediately by using C# static readonly keywords instead of const for now. Currently Roslyn accepts this code and generates a ldsfld instruction:

[FieldMutability(FieldMutability.ReflectedMemberName)]
static readonly string kNameOfPropertyXYZ = nameof(PropertyXYZ);

static string UseMemberName()
{   // `ldsfld` is already generated here because the field is "static readonly":
    return kNameOfPropertyXYZ;
}

public int PropertyXYZ { get; set; }

JIT's and AOT's could optimize/transform the static readonly field to a constant value whenever they see that the FieldMutabilityAttribute is applied to the static readonly field and indicates that the field is a constant value.

Thus the IL inside a .exe or .dll file on-disk would use ldsfld, but at runtime the ldsfld would be compiled to the equivalent of ldc or ldstr when the static readonly/initonly field is marked with a FieldMutabilityAttribute that indicates that the value is a constant.

ghost commented 3 years ago

Tagging subscribers to this area: @ajcvickers See info in area-owners.md if you want to be subscribed.

Issue Details

What does everyone think of the idea of tweaking the JIT and AOT to allow the `ldsfld` instruction to load *some* fields marked with `static literal`? "Some fields" meaning `ldsfld` would only support a `static literal` field if it has at least one attribute applied to the field. For example: ```cs class Example { // ldsfld would support this field because it has at least one attribute applied: [System.Reflection.Obfuscation(Exclude = true)] const string kExcludedFromObfuscation = "PropertyXYZ"; // ldsfld would support this field because it has at least one attribute applied: [FieldMutability(FieldMutability.ReflectedMemberName)] const string kNameOfPropertyXYZ = nameof(PropertyXYZ); // ldsfld would continue to reject this field, same as currently done: const string kSomethingElse = "abc"; public int PropertyXYZ { get; set; } static string UseConstant() { return kExcludedFromObfuscation; } } ``` Removing this limitation with `ldsfld` would be helpful for: * AOT compilers. * IL obfuscators. * IL decompilers such as ILSpy (quite useful for troubleshooting). You can see in my above example that I marked `kExcludedFromObfuscation` as being excluded from obfuscation, which would be necessary because its value ("PropertyXYZ") is coincidentally identical to a member name. Smart/automatic obfuscators and AOT-compilers can get confused and make incorrect transformations when a constant string is coincidentally identical to a member name but in reality is unrelated to the member and should not be transformed or renamed. Unfortunately the above usage of `Obfuscation(Exclude = true)` fails to have any effect, because the compiler loads the constant value using `ldstr` instead of `ldsfld`, because `ldsfld` doesn't allow `static literal` fields. Currently the above-shown `UseConstant()` method is compiled to: ``` ldstr "PropertyXYZ" ret ``` instead of: ``` ldsfld string kExcludedFromObfuscation ret ``` Thus tools such as IL obfuscators, IL decompilers, and IL-to-x86-64 AOT-compilers are unable to see that the `UseConstant()` method references the constant field named `kExcludedFromObfuscation`, and thus unable to use any attributes (such as ObfuscationAttribute) that were applied to the field. This problem would be solved if the JIT/AOT allowed `ldsfld` to load a `static literal` field when the field has at least one attribute applied to the field. It's also useful in the opposite case: It would be helpful to use an attribute to explicitly mark strings produced by the C# `nameof` operator, such as: ```cs [FieldMutability(FieldMutability.ReflectedMemberName)] const string kNameOfPropertyXYZ = nameof(PropertyXYZ); ``` .NET 5.x could be given a new attribute named perhaps `FieldMutabilityAttribute`, and corresponding `enum FieldMutability`, like follows: ```cs namespace System.Runtime.CompilerServices { public enum FieldMutability : byte { ///

Information about field mutability was not supplied or is unknown. Default or automatic behavior may be used.

Unspecified = 0, ///

The field is/was/should NOT be marked with the C# "readonly" nor "const" keywords. The field value is NOT a reflected type name or member name.

Mutable, ///

The field is/was/should be marked with the C# "readonly" keyword. The field value is NOT a reflected type name or member name.

Immutable, ///

The field is/was/should be marked with the C# "const" keyword and should be optimized as a constant value. The value is NOT a reflected type name or member name. AOT-compilers and obfuscators must not transform this value, even if the value happens to be identical to the name of a type or member of a type.

ConstantValue, ///

This is equivalent to a C# property setter marked with the "init" keyword. The field is read-only but its value can be set or supplied at the time of object construction/instantiation.

InitializeOnly, ///

The field value is constant, and it is the name of a type used in reflection or the C# "nameof" keyword. The name is not fully-qualified (should not contain "." character).

ReflectedTypeName, ///

The field value is constant, and it is the fully-qualified name of a type used in reflection or the C# "nameof" keyword. The name is fully-qualified thus it contains zero or more "." characters.

ReflectedTypeFullName, ///

The field value is constant, and it is the name of a member of a type used in reflection or the C# "nameof" keyword. The name is not fully-qualified (should not contain "." character).

ReflectedMemberName, ///

The field value is constant, and it is the fully-qualified name of a member of a type used in reflection or the C# "nameof" keyword. The name is fully-qualified thus it contains zero or more "." characters.

ReflectedMemberFullName, ///

The field value is constant, and it is a name used in reflection, other than the name of a type or member of a type.

OtherReflectedName, } [AttributeUsage(AttributeTargets.Field, AllowMultiple = false)] public class FieldMutabilityAttribute : System.Attribute { public FieldMutabilityAttribute(FieldMutability inMutability) { this.Mutability = inMutability; } public FieldMutability Mutability { get; } } // class FieldMutabilityAttribute } // namespace System.Runtime.CompilerServices ``` Even before Roslyn takes advantage of `ldsfld` lifting the restriction on `static literal`, people could start using the new attribute immediately by using C# `static readonly` keywords instead of `const` for now. Currently Roslyn accepts this code and generates a `ldsfld` instruction: ```cs [FieldMutability(FieldMutability.ReflectedMemberName)] static readonly string kNameOfPropertyXYZ = nameof(PropertyXYZ); static string UseMemberName() { // `ldsfld` is already generated here because the field is "static readonly": return kNameOfPropertyXYZ; } public int PropertyXYZ { get; set; } ``` JIT's and AOT's could optimize/transform the `static readonly` field to a constant value whenever they see that the `FieldMutabilityAttribute` is applied to the `static readonly` field and indicates that the field is a constant value. Thus the IL inside a .exe or .dll file on-disk would use `ldsfld`, but at runtime the `ldsfld` would be compiled to the equivalent of `ldc` or `ldstr` when the `static readonly/initonly` field is marked with a `FieldMutabilityAttribute` that indicates that the value is a constant.

Author:	verelpode
Assignees:	-
Labels:	`area-System.ComponentModel.DataAnnotations`, `untriaged`
Milestone:	-

jkotas commented 3 years ago

The literal fields are compile-time only artifact today. If we allowed referencing them in IL instructions, they would have to be materialized at runtime. Attributes do not help with that. Parsing attributes takes time too.

AOT compilers.
IL obfuscators.
IL decompilers such as ILSpy (quite useful for troubleshooting).

I do not see how this helps any of these.

If obsluscators wants to obfluscate constant strings, they should be able to do it on their own, without any help from the runtime.

verelpode commented 3 years ago

@jkotas -- alright I accept your viewpoint on that. To address your critique, would you like it better if the idea is replaced with the following? The following design has the advantage of being much simpler and it would solve the problem without requiring any changes to the ldsfld instruction.

As the first step, .NET Framework 5.x would be given a class containing several super-simple methods that do nothing other than merely returning the same string as passed in a parameter, like this:

namespace System.Runtime.CompilerServices
{
    public static class ReflectedNameMarkers
    {
        [MethodImplAttribute(MethodImplOptions.AggressiveInlining)]
        public static string NameOfType(string name) { return name; }

        [MethodImplAttribute(MethodImplOptions.AggressiveInlining)]
        public static string NameOfProperty(string name) { return name; }

        // ... and a few more ...

    } // class
} // namespace System.Runtime.CompilerServices

And then when the C# compiler / Roslyn compiles nameof(x), such as...

class ExampleClass
{
    public string Test1()
    {
        return nameof(ExampleClass);
    }

    public int PropertyXYZ { get; set; }
    public string Test2()
    {
        return nameof(PropertyXYZ);
    }
}

It would be compiled to the same as:

class ExampleClass
{
    public string Test1()
    {
        return System.Runtime.CompilerServices.
            ReflectedNameMarkers.NameOfType("ExampleClass");
    }

    public int PropertyXYZ { get; set; }
    public string Test2()
    {
        return System.Runtime.CompilerServices.
            ReflectedNameMarkers.NameOfProperty("PropertyXYZ");
    }
}

Whereas currently it is compiled to the same as:

class ExampleClass
{
    public string Test1()
    {
        return "ExampleClass";
    }

    public int PropertyXYZ { get; set; }
    public string Test2()
    {
        return "PropertyXYZ";
    }
}

Currently strings produced by nameof(x) are not annotated/marked any differently than other strings, thus IL tools have trouble recognizing that these strings are for reflection purposes.

However, if the C# compiler wraps these reflection strings in calls to ReflectedNameMarkers.NameOfType(string) etc as demonstrated above, then IL tools could simply look for invocations of these methods, and then they would reliably know that a string is for reflection purposes when it is wrapped in an invocation of ReflectedNameMarkers.NameOfType(string).

It'd also be useful to make a method ReflectedNameMarkers.NotForReflection(string) that does the opposite of NameOfType(string). When a developer is experiencing troubles with an IL tool because the developer is using a string that is coincidentally identical to a type or member name, and because the IL tool may be using heuristics to try to identify reflection names (because currently no other solution exists), then the developer could explicitly instruct the IL tool that the string is unrelated to reflection, by doing this:

using System.Runtime.CompilerServices;
class ExampleClass
{
    public int PropertyXYZ { get; set; }

    public string Test3()
    {
        return ReflectedNameMarkers.NotForReflection("PropertyXYZ");
    }
}

In full, the class ReflectedNameMarkers could contain:

namespace System.Runtime.CompilerServices
{
    public static class ReflectedNameMarkers
    {
        [MethodImplAttribute(MethodImplOptions.AggressiveInlining)]
        public static string NameOfType(string name) { return name; }

        [MethodImplAttribute(MethodImplOptions.AggressiveInlining)]
        public static string NameOfMethod(string name) { return name; }

        [MethodImplAttribute(MethodImplOptions.AggressiveInlining)]
        public static string NameOfField(string name) { return name; }

        [MethodImplAttribute(MethodImplOptions.AggressiveInlining)]
        public static string NameOfProperty(string name) { return name; }

        [MethodImplAttribute(MethodImplOptions.AggressiveInlining)]
        public static string NameOfEvent(string name) { return name; }

        [MethodImplAttribute(MethodImplOptions.AggressiveInlining)]
        public static string NameOfEnumMember(string name) { return name; }

        [MethodImplAttribute(MethodImplOptions.AggressiveInlining)]
        public static string NotForReflection(string text) { return text; }

    } // class
} // namespace System.Runtime.CompilerServices

If desired, the NameOfType(string) method could be split into several more-specific methods named as follows, because this additional info might be useful to IL tools:

NameOfClass or NameOfReferenceType
NameOfStructValueType
NameOfEnumType
NameOfDelegateType
NameOfPrimitiveType
NameOfOtherType

Also if desired, the ReflectedNameMarkers class might also contain methods to explicitly identify fully-qualified names as opposed to unqualified names, but this is probably unnecessary because IL tools could instead look for the presence of one or more "." characters in the name.

What do you think? Do you like this design better?

verelpode commented 3 years ago

It occurred to me that people could begin using this new class System.Runtime.CompilerServices.ReflectedNameMarkers even before Roslyn supports it (or even if Roslyn never supports it). In the meantime, people could use the C# nameof keyword/function together with a manual invocation of the method ReflectedNameMarkers.NameOfType(string). For example, people could write:

using System.Runtime.CompilerServices;
class ExampleClass
{
    public int PropertyXYZ { get; set; }

    public string Test1()
    {
        return ReflectedNameMarkers.NameOfType(nameof(ExampleClass));
    }

    public string Test2()
    {
        return ReflectedNameMarkers.NameOfProperty(nameof(PropertyXYZ));
    }

    public string Test3()
    {
        return ReflectedNameMarkers.NotForReflection("PropertyXYZ");
    }
}

Thus it would be usable immediately, and various IL tools could immediately begin supporting the recognition / special meaning of invocations of ReflectedNameMarkers.NameOfType. Obviously, to gain widespread support, this class ReflectedNameMarkers (or similar) would need to be made official, i.e. added to .NET Framework 5.x in a suitable official namespace (presumably System.Runtime.CompilerServices is best).

Obviously the final solution would be nicer if/when Roslyn automatically transforms nameof(x) to ReflectedNameMarkers.NameOfType("x") instead of requiring people to manually write ReflectedNameMarkers.NameOfType(nameof(x)).

jkotas commented 3 years ago

I do not see the scenario that this helps with.

As I have said above - if obfuscators want to obfuscate constant strings, they should be able to do it on their own, without any help from the runtime.

verelpode commented 3 years ago

As I have said above - if obfuscators wants to obfuscate constant strings, they should be able to do it on their own, without any help from the runtime.

I was also thinking about the experiences that were learned from ".NET Native", not only thinking about obfuscators, but if I use the obfuscator example for now since you and I mentioned it, I'll try to explain the problem more clearly than my previous messages.

Firstly, if the original/input source code is....

class ExampleClass
{
    public int PropertyXYZ { get; set; }
    public string Test()
    {
        return nameof(PropertyXYZ); // Currently compiled to: return "PropertyXYZ";
    }
}

...and when an obfuscator needs to rename property PropertyXYZ to a1, then obfuscators lack sufficient information to know whether the string "PropertyXYZ" should also be transformed to "a1" or left unchanged. Obfuscators -- but also other IL tools such as cross-compilers and analyzers -- are left unable to reliably determine which of the following two transformations is the correct one:

public int a1 { get; set; }
public string Test()
{
    return "a1";
}

// VERSUS:

public int a1 { get; set; }
public string Test()
{
    return "PropertyXYZ";
}

Currently IL tools are prevented from seeing the C# nameof expression, thus these tools see only a string that equals the name of a property/member or type, thus these tools cannot know whether the string is truly the name of a property versus just coincidentally identical to the name of a member or type. This lack of the necessary information leads to incorrect transformations or analysis results or cross-compiles, such as the above example of an obfuscator not knowing whether the correct transformation is return "a1"; OR unchanged return "PropertyXYZ";.

MichalStrehovsky commented 3 years ago

This problem is not limited to nameof - it's a more general problem of figuring out what parts of the program are accessed by reflection. Someone not using nameof is going to hit this too - nameof is not mandatory - one can just spell out the name as a literal string in the source code and the runtime behavior is the same.

IL Linker faces similar problem - the problem there is "is it safe to remove a statically unreachable part of the program" - this problem is very similar. The safety problem comes from the fact that an app could reflection-access something that wasn't seen as used.

https://github.com/mono/linker/blob/a86658f4602e616d71e5d6db23a30940534132e2/docs/design/reflection-flow.md is one of the ways we're trying to address this. In .NET 5 we shipped a preview and we're iterating on that further.

verelpode commented 3 years ago

@MichalStrehovsky

Someone not using nameof is going to hit this too - nameof is not mandatory - one can just spell out the name as a literal string in the source code and the runtime behavior is the same.

True. I don't want to ask for too much, and I don't want to set my expectations unfairly high, therefore I wouldn't complain if I use ILLink now or in future and ILLink "fails" to handle a case where I wrote a type or member name as a literal string instead of nameof. I would say that ILLink's failure is my own fault because I could've and should've used nameof but I wrote a literal string instead, so it's my fault, BUT....

But currently I'm unable to say it's my own fault, because currently even if I do use nameof correctly, ILLink or other IL tools still cannot see that I used nameof correctly. Either of my two aforementioned suggestions could eliminate this nameof problem, because either of those would make nameof visible/detectable in the IL.

Thanks for the link to the very relevant "reflection-flow.md" re .NET Native and ILLink. .NET Native is/was fundamentally great in principle, so it's sad that .NET Native received so many complaints as a result of reflection issues.

My suggestion for nameof would help reduce these problems. It wouldn't solve everything mentioned in that "reflection-flow.md", but it would help. I don't believe there exists any single "killer solution" that kills all reflection/ILLink problems all at once, so I think progress needs to be made step-by-step, and one of those steps is eliminating the nameof problem, such as how I mentioned. Ofcourse I'd love to kill the entire problem, but since that's impossible, I suggest making progress by shrinking the size of the problem starting with nameof.

I see the "reflection-flow.md" includes several examples of invoking Type.GetMethod(string) such as:

_otherType.GetMethod("RandomMethod");

I suggest that invoking Type.GetMethod(string) should be avoided as much as possible, and instead the following IL should be produced whenever possible:

ldtoken OtherType.RandomMethod
call class System.Reflection.MethodBase System.Reflection.MethodBase::GetMethodFromHandle(System.RuntimeMethodHandle)

That is already supported and preexisting at the IL level, but unfortunately at the C# level there does not yet exist any equivalent of typeof(x) that works for members instead of types. For example, ideally C# would support:

System.Reflection.MethodBase info = metadataof(OtherType.RandomMethod);
// ...to be used instead of:
System.Reflection.MethodBase info = typeof(OtherType).GetMethod("RandomMethod");
// and also instead of:
System.Reflection.MethodBase info = typeof(OtherType).GetMethod(nameof(OtherType.RandomMethod));

If metadataof and the improved nameof would be implemented, then yes I realize there would still remain several unsolved problems as mentioned in "reflection-flow.md", but at least good progress can be made, and that's practically a requirement because no single killer solution exists.

verelpode commented 3 years ago

@MichalStrehovsky -- I see the "reflection-flow.md" also says:

To document reflection use across methods, we'll introduce a new attribute DynamicallyAccessedMembersAttribute that can be attached to method parameters, the method return parameter, fields, and properties (whose type is System.Type, or System.String).

It says fields of type string, but it may break when the string field is marked const. Although you can already apply any Attributes to const string fields, there is a problem because ILLink or other IL tools cannot see where the constant field is used. IL tools can only see where the const string field is defined and what Attributes are applied to it, but not where the const field is used.

Hence my ldsfld idea described in the very first message in this issue. This would allow IL tools to see where a const field is used, not only where it is defined, provided the const field has at least one Attribute applied to it. However I would also still be happy if the ldsfld idea is rejected in favor of using the simpler idea of ReflectedNameMarkers.NameOfType(string).

Note the ldsfld idea is more powerful and might be useful for "reflection-flow.md", or possibly even mandatory within a goal of solving ALL of the problems described in the "reflection-flow.md". So the ldsfld idea does more than the ReflectedNameMarkers.NameOfType idea, but ReflectedNameMarkers.NameOfType is appealing because it is much simpler and much less work than the ldsfld idea, and perhaps sufficient.

It boils down to a choice between:

Do you want to solve only nameof? (then use the ReflectedNameMarkers.NameOfType idea), or...
Do you want to solve all reflection/linking issues that might require attributes to be applied to const fields, including nameof but also other issues where it may be useful to apply attributes to const fields (if yes, then use the idea of allowing ldsfld to load attribute-annotated const fields, plus make nameof produce a backing field, that is a const string field with a suitable Attribute applied).

jkotas commented 3 years ago

It is not possible to use nameof in many situations. For example, when the code is using reflection to access types or methods that it does not have direct access to, like here: https://github.com/dotnet/runtime/blob/master/src/libraries/System.Private.CoreLib/src/System/AppDomain.cs#L415

It means that any solution based on nameof will be partial at best.

The solution we are going after with the linker attributes is much more complete solution for this problem. The obfuscators can use these same attributes to figure what is reflected upon as well.

verelpode commented 3 years ago

Here's an improvement to my previous designs. This would provide more info to ILLinker and other tools. For example, if the original/input source code is....

class ExampleClass
{
    public string TestNameOfType()
    {
        return nameof(ExampleClass);
    }
}

Then the C# compiler would transform it to the equivalent of:

class ExampleClass
{
    [ReflectedNameInfoAttribute(typeof(ExampleClass))]
    private const string __nameOf_ExampleClass = "ExampleClass";
    // Alternatively "static readonly" (instead of "const") would be required if "ldsfld" will never load const fields.

    public string TestNameOfType()
    {
        return __nameOf_ExampleClass; // was: return nameof(ExampleClass);
    }
}

Where ReflectedNameInfoAttribute would be like this:

[AttributeUsage(AttributeTargets.Field, AllowMultiple = false)]
public class ReflectedNameInfoAttribute : System.Attribute
{
    public ReflectedNameInfoAttribute(System.Type inTargetType)
    {
        if (inTargetType is null) throw new ArgumentNullException();
        this.TargetType = inTargetType;
        // If desired, can also save a copy of the full names:
        this.TypeFullName = inTargetType.FullName;
        this.AssemblyFullName = inTargetType.Assembly.FullName;
    }

    public ReflectedNameInfoAttribute(System.Reflection.MemberInfo inMember)
    {
        if (inMember is null) throw new ArgumentNullException();
        this.TargetMember = inMember;
        this.TargetMemberType = inMember.MemberType;
        System.Type t = inMember.DeclaringType;
        this.TargetType = t;
        // If desired, can also save a copy of the full names:
        this.TypeFullName = t.FullName;
        this.AssemblyFullName = t.Assembly.FullName;
    }

    // This can be used to target a member when the System.Reflection.MemberInfo is unavailable.
    public ReflectedNameInfoAttribute(System.Type inTargetType, System.Reflection.MemberTypes inTargetMemberType)
    {
        if (inTargetType is null) throw new ArgumentNullException();
        this.TargetType = inTargetType;
        this.TargetMemberType = inTargetMemberType;
        // If desired, can also save a copy of the full names:
        this.TypeFullName = inTargetType.FullName;
        this.AssemblyFullName = inTargetType.Assembly.FullName;
    }

    public ReflectedNameInfoAttribute(string inTypeFullName, string inAssemblyFullName)
    {
        this.TypeFullName = inTypeFullName;
        this.AssemblyFullName = inAssemblyFullName;
    }

    public System.Type TargetType { get; set; }

    public System.Reflection.MemberInfo TargetMember { get; set; }

    // Equals System.Reflection.MemberInfo.MemberType.
    public System.Reflection.MemberTypes TargetMemberType { get; set; }

    // Equals System.Type.FullName.
    public string TypeFullName { get; set; }

    // Equals System.Type.Assembly.FullName.
    public string AssemblyFullName { get; set; }

}

When nameof is used for a member instead of a type, for example...

class ExampleClass
{
    public int PropertyXYZ { get; set; }

    public string TestNameOfProperty()
    {
        return nameof(PropertyXYZ);
    }
}

Then the C# compiler would transform it to the equivalent of:

class ExampleClass
{
    public int PropertyXYZ { get; set; }

    [ReflectedNameInfoAttribute(metadataof(PropertyXYZ))]
    private const string __nameOf_PropertyXYZ = "PropertyXYZ";
    // Alternatively "static readonly" (instead of "const") would be required if "ldsfld" will never load const fields.

    public string TestNameOfProperty()
    {
        return __nameOf_PropertyXYZ; // was: return nameof(PropertyXYZ);
    }
}

Where metadataof would be a function like typeof except for members. Where typeof returns System.Type, metadataof would return System.Reflection.MemberInfo. Alternatively, if no metadataof will exist, then:

class ExampleClass
{
    public int PropertyXYZ { get; set; }

    [ReflectedNameInfoAttribute(typeof(ExampleClass), System.Reflection.MemberTypes.Property)]
    private const string __nameOf_PropertyXYZ = "PropertyXYZ";

    public string TestNameOfProperty()
    {
        return __nameOf_PropertyXYZ; // was: return nameof(PropertyXYZ);
    }
}

verelpode commented 3 years ago

@jkotas wrote:

It is not possible to use nameof in many situations. For example, when the code is using reflection to access types or methods that it does not have direct access to, like here:

I see line 417 is:

Type type = Type.GetType("System.Security.Principal.GenericPrincipal, System.Security.Claims", throwOnError: true)!;

This could be either manually or automatically transformed to:

[ReflectedNameInfoAttribute("System.Security.Principal.GenericPrincipal", "System.Security.Claims")]
private const/*or static readonly*/ string __fullnameOf_GenericPrincipal = "System.Security.Principal.GenericPrincipal, System.Security.Claims";

Type type = Type.GetType(__fullnameOf_GenericPrincipal, throwOnError: true)!;

In other words, the ReflectedNameInfoAttribute could be usable both with and without the nameof(x) function.

Either you manually use the ReflectedNameInfoAttribute similar to the above, or maybe the C# compiler might recognize special method invocations such as Type.GetType(string) and automatically generate a backing field with ReflectedNameInfoAttribute applied.

The solution we are going after with the linker attributes is much more complete solution for this problem.

I'm unsure if you're saying that a solution was already recently invented, but if that's what you mean, then that's great news.

jkotas commented 3 years ago

In other words, the ReflectedNameInfoAttribute could be usable both with and without the nameof(x) function.

What benefit does it have over the reflection flow attributes (https://github.com/mono/linker/blob/a86658f4602e616d71e5d6db23a30940534132e2/docs/design/reflection-flow.md)?

maybe the C# compiler might recognize special method invocations

This would mean teaching the whole stack about this feature. Features like that have to be very high value for it to be worth it. I do not see the very high value here.

verelpode commented 3 years ago

I don't see how anything currently described in the "reflection-flow.md" solves the nameof problem, but possibly that's because I don't understand some particular part of it or I missed something in it.

verelpode commented 3 years ago

A neat syntax idea just occurred to me. What if the C# compiler allowed you to write a string with quotation marks inside of nameof(x) like follows?

Type t = System.Type.GetType(nameof("System.Security.Principal.GenericPrincipal"));

And then the compiler would transform the above to use an attribute-annotated compiler-generated backing field, something like this:

[ReflectedNameInfoAttribute("System.Security.Principal.GenericPrincipal")]
private const/*or static readonly*/ string __nameOf1234 = "System.Security.Principal.GenericPrincipal";

Type t = System.Type.GetType(__nameOf1234);

Thus obviously people would and should still try to use nameof(x) without quotation marks as much as possible, but for the situations where such direct access is impossible or bad, then the compiler could allow you to write it as a string with quotation marks: nameof("x") whenever nameof(x) is impossible. The string would be immediately recognizable to ILLinker and other tools thanks to the compiler-generated attribute applied to the backing field.

Soft References that allow nameof to be used where currently it cannot be used?

Alternatively, if people want to go further with this idea, another possibility is: Don't use quotation marks inside nameof; don't change any C# syntax; instead give Visual Studio an ability to add a "soft reference" to a .csproj !

If you added the Assembly "System.Security.Claims.dll" to a .csproj as a "soft reference", then you'd be able to use nameof(System.Security.Principal.GenericPrincipal) with the normal syntax, but the C# compiler would convert it to a string because it's only a "soft reference" to the DLL/Assembly.

The compiler would still perform compile-time checking, thus if you make a typo, frex nameof(System.Security.Principal.GenericPrincpl) then the compiler would generate an error saying that "GenericPrincpl" doesn't exist. The compiler can do this because it has access to read the Assembly at compile-time. However the output IL would not contain any true/hard/normal reference to the DLL, rather it would contain an attribute-annotated string, when in Visual Studio you only add the Assembly "System.Security.Claims.dll" as a "soft reference" or a "dynamic reference" or a "weak reference" or whatever you want to name the new feature.

verelpode commented 3 years ago

Use the extern keyword and DllImportAttribute?

It is not possible to use nameof in many situations. For example, when the code is using reflection to access types or methods that it does not have direct access to, like here:

How about changing that code to use the extern keyword? Currently that code is written like this....

IPrincipal Example()
{
    System.Type t = System.Type.GetType("System.Security.Principal.GenericPrincipal, System.Security.Claims");
    System.Reflection.MethodInfo mi = t.GetMethod("GetDefaultInstance");
    return (IPrincipal) mi.Invoke(null, null);
}

How about changing it to use extern like this:

[System.Runtime.InteropServices.DllImport("System.Security.Claims.dll", EntryPoint ="System.Security.Principal.GenericPrincipal.GetDefaultInstance", 
    CallingConvention = CallingConvention.Clr, WeakLink = true)]
private static extern IPrincipal GetDefaultInstance();

IPrincipal Example()
{
    return GetDefaultInstance();
}

Ofcourse currently that doesn't work, but it could. The above WeakLink property of DllImportAttribute doesn't exist, but it could. Likewise the above member CallingConvention.Clr of the enum CallingConvention doesn't exist, but it could, thus DllImportAttribute or some other new attribute could be used for weak-linking/importing C# methods into C# Assemblies, instead of the current problematic way of passing a string to System.Type.GetType(string).

Currently the preexisting extern keyword already provides a much nicer syntax (the normal syntax) for passing parameters to external methods, much nicer than the comparatively cumbersome way of constructing an object array to pass to MethodInfo.Invoke(object, object[]). So why not use extern for importing both C#/CLR and C++ methods, not only C++ methods? It looks like it may be feasible to extend extern and DllImportAttribute to support importing CLR methods from IL DLL's, not only native DLL's.