Open verelpode opened 3 years ago
Tagging subscribers to this area: @ajcvickers See info in area-owners.md if you want to be subscribed.
Author: | verelpode |
---|---|
Assignees: | - |
Labels: | `area-System.ComponentModel.DataAnnotations`, `untriaged` |
Milestone: | - |
The literal fields are compile-time only artifact today. If we allowed referencing them in IL instructions, they would have to be materialized at runtime. Attributes do not help with that. Parsing attributes takes time too.
AOT compilers.
IL obfuscators.
IL decompilers such as ILSpy (quite useful for troubleshooting).
I do not see how this helps any of these.
If obsluscators wants to obfluscate constant strings, they should be able to do it on their own, without any help from the runtime.
@jkotas -- alright I accept your viewpoint on that. To address your critique, would you like it better if the idea is replaced with the following? The following design has the advantage of being much simpler and it would solve the problem without requiring any changes to the ldsfld
instruction.
As the first step, .NET Framework 5.x would be given a class containing several super-simple methods that do nothing other than merely returning the same string as passed in a parameter, like this:
namespace System.Runtime.CompilerServices
{
public static class ReflectedNameMarkers
{
[MethodImplAttribute(MethodImplOptions.AggressiveInlining)]
public static string NameOfType(string name) { return name; }
[MethodImplAttribute(MethodImplOptions.AggressiveInlining)]
public static string NameOfProperty(string name) { return name; }
// ... and a few more ...
} // class
} // namespace System.Runtime.CompilerServices
And then when the C# compiler / Roslyn compiles nameof(x)
, such as...
class ExampleClass
{
public string Test1()
{
return nameof(ExampleClass);
}
public int PropertyXYZ { get; set; }
public string Test2()
{
return nameof(PropertyXYZ);
}
}
It would be compiled to the same as:
class ExampleClass
{
public string Test1()
{
return System.Runtime.CompilerServices.
ReflectedNameMarkers.NameOfType("ExampleClass");
}
public int PropertyXYZ { get; set; }
public string Test2()
{
return System.Runtime.CompilerServices.
ReflectedNameMarkers.NameOfProperty("PropertyXYZ");
}
}
Whereas currently it is compiled to the same as:
class ExampleClass
{
public string Test1()
{
return "ExampleClass";
}
public int PropertyXYZ { get; set; }
public string Test2()
{
return "PropertyXYZ";
}
}
Currently strings produced by nameof(x)
are not annotated/marked any differently than other strings, thus IL tools have trouble recognizing that these strings are for reflection purposes.
However, if the C# compiler wraps these reflection strings in calls to ReflectedNameMarkers.NameOfType(string)
etc as demonstrated above, then IL tools could simply look for invocations of these methods, and then they would reliably know that a string is for reflection purposes when it is wrapped in an invocation of ReflectedNameMarkers.NameOfType(string)
.
It'd also be useful to make a method ReflectedNameMarkers.NotForReflection(string)
that does the opposite of NameOfType(string)
. When a developer is experiencing troubles with an IL tool because the developer is using a string that is coincidentally identical to a type or member name, and because the IL tool may be using heuristics to try to identify reflection names (because currently no other solution exists), then the developer could explicitly instruct the IL tool that the string is unrelated to reflection, by doing this:
using System.Runtime.CompilerServices;
class ExampleClass
{
public int PropertyXYZ { get; set; }
public string Test3()
{
return ReflectedNameMarkers.NotForReflection("PropertyXYZ");
}
}
In full, the class ReflectedNameMarkers
could contain:
namespace System.Runtime.CompilerServices
{
public static class ReflectedNameMarkers
{
[MethodImplAttribute(MethodImplOptions.AggressiveInlining)]
public static string NameOfType(string name) { return name; }
[MethodImplAttribute(MethodImplOptions.AggressiveInlining)]
public static string NameOfMethod(string name) { return name; }
[MethodImplAttribute(MethodImplOptions.AggressiveInlining)]
public static string NameOfField(string name) { return name; }
[MethodImplAttribute(MethodImplOptions.AggressiveInlining)]
public static string NameOfProperty(string name) { return name; }
[MethodImplAttribute(MethodImplOptions.AggressiveInlining)]
public static string NameOfEvent(string name) { return name; }
[MethodImplAttribute(MethodImplOptions.AggressiveInlining)]
public static string NameOfEnumMember(string name) { return name; }
[MethodImplAttribute(MethodImplOptions.AggressiveInlining)]
public static string NotForReflection(string text) { return text; }
} // class
} // namespace System.Runtime.CompilerServices
If desired, the NameOfType(string)
method could be split into several more-specific methods named as follows, because this additional info might be useful to IL tools:
NameOfClass
or NameOfReferenceType
NameOfStructValueType
NameOfEnumType
NameOfDelegateType
NameOfPrimitiveType
NameOfOtherType
Also if desired, the ReflectedNameMarkers
class might also contain methods to explicitly identify fully-qualified names as opposed to unqualified names, but this is probably unnecessary because IL tools could instead look for the presence of one or more "." characters in the name.
What do you think? Do you like this design better?
It occurred to me that people could begin using this new class System.Runtime.CompilerServices.ReflectedNameMarkers
even before Roslyn supports it (or even if Roslyn never supports it). In the meantime, people could use the C# nameof
keyword/function together with a manual invocation of the method ReflectedNameMarkers.NameOfType(string)
. For example, people could write:
using System.Runtime.CompilerServices;
class ExampleClass
{
public int PropertyXYZ { get; set; }
public string Test1()
{
return ReflectedNameMarkers.NameOfType(nameof(ExampleClass));
}
public string Test2()
{
return ReflectedNameMarkers.NameOfProperty(nameof(PropertyXYZ));
}
public string Test3()
{
return ReflectedNameMarkers.NotForReflection("PropertyXYZ");
}
}
Thus it would be usable immediately, and various IL tools could immediately begin supporting the recognition / special meaning of invocations of ReflectedNameMarkers.NameOfType
. Obviously, to gain widespread support, this class ReflectedNameMarkers
(or similar) would need to be made official, i.e. added to .NET Framework 5.x in a suitable official namespace (presumably System.Runtime.CompilerServices
is best).
Obviously the final solution would be nicer if/when Roslyn automatically transforms nameof(x)
to ReflectedNameMarkers.NameOfType("x")
instead of requiring people to manually write ReflectedNameMarkers.NameOfType(nameof(x))
.
I do not see the scenario that this helps with.
As I have said above - if obfuscators want to obfuscate constant strings, they should be able to do it on their own, without any help from the runtime.
As I have said above - if obfuscators wants to obfuscate constant strings, they should be able to do it on their own, without any help from the runtime.
I was also thinking about the experiences that were learned from ".NET Native", not only thinking about obfuscators, but if I use the obfuscator example for now since you and I mentioned it, I'll try to explain the problem more clearly than my previous messages.
Firstly, if the original/input source code is....
class ExampleClass
{
public int PropertyXYZ { get; set; }
public string Test()
{
return nameof(PropertyXYZ); // Currently compiled to: return "PropertyXYZ";
}
}
...and when an obfuscator needs to rename property PropertyXYZ
to a1
, then obfuscators lack sufficient information to know whether the string "PropertyXYZ"
should also be transformed to "a1"
or left unchanged. Obfuscators -- but also other IL tools such as cross-compilers and analyzers -- are left unable to reliably determine which of the following two transformations is the correct one:
public int a1 { get; set; }
public string Test()
{
return "a1";
}
// VERSUS:
public int a1 { get; set; }
public string Test()
{
return "PropertyXYZ";
}
Currently IL tools are prevented from seeing the C# nameof
expression, thus these tools see only a string that equals the name of a property/member or type, thus these tools cannot know whether the string is truly the name of a property versus just coincidentally identical to the name of a member or type. This lack of the necessary information leads to incorrect transformations or analysis results or cross-compiles, such as the above example of an obfuscator not knowing whether the correct transformation is return "a1";
OR unchanged return "PropertyXYZ";
.
This problem is not limited to nameof
- it's a more general problem of figuring out what parts of the program are accessed by reflection. Someone not using nameof
is going to hit this too - nameof
is not mandatory - one can just spell out the name as a literal string in the source code and the runtime behavior is the same.
IL Linker faces similar problem - the problem there is "is it safe to remove a statically unreachable part of the program" - this problem is very similar. The safety problem comes from the fact that an app could reflection-access something that wasn't seen as used.
https://github.com/mono/linker/blob/a86658f4602e616d71e5d6db23a30940534132e2/docs/design/reflection-flow.md is one of the ways we're trying to address this. In .NET 5 we shipped a preview and we're iterating on that further.
@MichalStrehovsky
Someone not using nameof is going to hit this too - nameof is not mandatory - one can just spell out the name as a literal string in the source code and the runtime behavior is the same.
True. I don't want to ask for too much, and I don't want to set my expectations unfairly high, therefore I wouldn't complain if I use ILLink now or in future and ILLink "fails" to handle a case where I wrote a type or member name as a literal string instead of nameof
. I would say that ILLink's failure is my own fault because I could've and should've used nameof
but I wrote a literal string instead, so it's my fault, BUT....
But currently I'm unable to say it's my own fault, because currently even if I do use nameof
correctly, ILLink or other IL tools still cannot see that I used nameof
correctly. Either of my two aforementioned suggestions could eliminate this nameof
problem, because either of those would make nameof
visible/detectable in the IL.
Thanks for the link to the very relevant "reflection-flow.md" re .NET Native and ILLink. .NET Native is/was fundamentally great in principle, so it's sad that .NET Native received so many complaints as a result of reflection issues.
My suggestion for nameof
would help reduce these problems. It wouldn't solve everything mentioned in that "reflection-flow.md", but it would help. I don't believe there exists any single "killer solution" that kills all reflection/ILLink problems all at once, so I think progress needs to be made step-by-step, and one of those steps is eliminating the nameof
problem, such as how I mentioned. Ofcourse I'd love to kill the entire problem, but since that's impossible, I suggest making progress by shrinking the size of the problem starting with nameof
.
I see the "reflection-flow.md" includes several examples of invoking Type.GetMethod(string)
such as:
_otherType.GetMethod("RandomMethod");
I suggest that invoking Type.GetMethod(string)
should be avoided as much as possible, and instead the following IL should be produced whenever possible:
ldtoken OtherType.RandomMethod
call class System.Reflection.MethodBase System.Reflection.MethodBase::GetMethodFromHandle(System.RuntimeMethodHandle)
That is already supported and preexisting at the IL level, but unfortunately at the C# level there does not yet exist any equivalent of typeof(x)
that works for members instead of types. For example, ideally C# would support:
System.Reflection.MethodBase info = metadataof(OtherType.RandomMethod);
// ...to be used instead of:
System.Reflection.MethodBase info = typeof(OtherType).GetMethod("RandomMethod");
// and also instead of:
System.Reflection.MethodBase info = typeof(OtherType).GetMethod(nameof(OtherType.RandomMethod));
If metadataof
and the improved nameof
would be implemented, then yes I realize there would still remain several unsolved problems as mentioned in "reflection-flow.md", but at least good progress can be made, and that's practically a requirement because no single killer solution exists.
@MichalStrehovsky -- I see the "reflection-flow.md" also says:
To document reflection use across methods, we'll introduce a new attribute DynamicallyAccessedMembersAttribute that can be attached to method parameters, the method return parameter, fields, and properties (whose type is System.Type, or System.String).
It says fields of type string, but it may break when the string field is marked const
. Although you can already apply any Attributes to const string
fields, there is a problem because ILLink or other IL tools cannot see where the constant field is used. IL tools can only see where the const string
field is defined and what Attributes are applied to it, but not where the const field is used.
Hence my ldsfld
idea described in the very first message in this issue. This would allow IL tools to see where a const field is used, not only where it is defined, provided the const field has at least one Attribute applied to it.
However I would also still be happy if the ldsfld
idea is rejected in favor of using the simpler idea of ReflectedNameMarkers.NameOfType(string)
.
Note the ldsfld
idea is more powerful and might be useful for "reflection-flow.md", or possibly even mandatory within a goal of solving ALL of the problems described in the "reflection-flow.md". So the ldsfld
idea does more than the ReflectedNameMarkers.NameOfType
idea, but ReflectedNameMarkers.NameOfType
is appealing because it is much simpler and much less work than the ldsfld
idea, and perhaps sufficient.
It boils down to a choice between:
nameof
? (then use the ReflectedNameMarkers.NameOfType
idea), or...const
fields, including nameof
but also other issues where it may be useful to apply attributes to const fields (if yes, then use the idea of allowing ldsfld
to load attribute-annotated const fields, plus make nameof
produce a backing field, that is a const string field with a suitable Attribute applied).It is not possible to use nameof
in many situations. For example, when the code is using reflection to access types or methods that it does not have direct access to, like here: https://github.com/dotnet/runtime/blob/master/src/libraries/System.Private.CoreLib/src/System/AppDomain.cs#L415
It means that any solution based on nameof will be partial at best.
The solution we are going after with the linker attributes is much more complete solution for this problem. The obfuscators can use these same attributes to figure what is reflected upon as well.
Here's an improvement to my previous designs. This would provide more info to ILLinker and other tools. For example, if the original/input source code is....
class ExampleClass
{
public string TestNameOfType()
{
return nameof(ExampleClass);
}
}
Then the C# compiler would transform it to the equivalent of:
class ExampleClass
{
[ReflectedNameInfoAttribute(typeof(ExampleClass))]
private const string __nameOf_ExampleClass = "ExampleClass";
// Alternatively "static readonly" (instead of "const") would be required if "ldsfld" will never load const fields.
public string TestNameOfType()
{
return __nameOf_ExampleClass; // was: return nameof(ExampleClass);
}
}
Where ReflectedNameInfoAttribute
would be like this:
[AttributeUsage(AttributeTargets.Field, AllowMultiple = false)]
public class ReflectedNameInfoAttribute : System.Attribute
{
public ReflectedNameInfoAttribute(System.Type inTargetType)
{
if (inTargetType is null) throw new ArgumentNullException();
this.TargetType = inTargetType;
// If desired, can also save a copy of the full names:
this.TypeFullName = inTargetType.FullName;
this.AssemblyFullName = inTargetType.Assembly.FullName;
}
public ReflectedNameInfoAttribute(System.Reflection.MemberInfo inMember)
{
if (inMember is null) throw new ArgumentNullException();
this.TargetMember = inMember;
this.TargetMemberType = inMember.MemberType;
System.Type t = inMember.DeclaringType;
this.TargetType = t;
// If desired, can also save a copy of the full names:
this.TypeFullName = t.FullName;
this.AssemblyFullName = t.Assembly.FullName;
}
// This can be used to target a member when the System.Reflection.MemberInfo is unavailable.
public ReflectedNameInfoAttribute(System.Type inTargetType, System.Reflection.MemberTypes inTargetMemberType)
{
if (inTargetType is null) throw new ArgumentNullException();
this.TargetType = inTargetType;
this.TargetMemberType = inTargetMemberType;
// If desired, can also save a copy of the full names:
this.TypeFullName = inTargetType.FullName;
this.AssemblyFullName = inTargetType.Assembly.FullName;
}
public ReflectedNameInfoAttribute(string inTypeFullName, string inAssemblyFullName)
{
this.TypeFullName = inTypeFullName;
this.AssemblyFullName = inAssemblyFullName;
}
public System.Type TargetType { get; set; }
public System.Reflection.MemberInfo TargetMember { get; set; }
// Equals System.Reflection.MemberInfo.MemberType.
public System.Reflection.MemberTypes TargetMemberType { get; set; }
// Equals System.Type.FullName.
public string TypeFullName { get; set; }
// Equals System.Type.Assembly.FullName.
public string AssemblyFullName { get; set; }
}
When nameof
is used for a member instead of a type, for example...
class ExampleClass
{
public int PropertyXYZ { get; set; }
public string TestNameOfProperty()
{
return nameof(PropertyXYZ);
}
}
Then the C# compiler would transform it to the equivalent of:
class ExampleClass
{
public int PropertyXYZ { get; set; }
[ReflectedNameInfoAttribute(metadataof(PropertyXYZ))]
private const string __nameOf_PropertyXYZ = "PropertyXYZ";
// Alternatively "static readonly" (instead of "const") would be required if "ldsfld" will never load const fields.
public string TestNameOfProperty()
{
return __nameOf_PropertyXYZ; // was: return nameof(PropertyXYZ);
}
}
Where metadataof
would be a function like typeof
except for members. Where typeof
returns System.Type
, metadataof
would return System.Reflection.MemberInfo
.
Alternatively, if no metadataof
will exist, then:
class ExampleClass
{
public int PropertyXYZ { get; set; }
[ReflectedNameInfoAttribute(typeof(ExampleClass), System.Reflection.MemberTypes.Property)]
private const string __nameOf_PropertyXYZ = "PropertyXYZ";
public string TestNameOfProperty()
{
return __nameOf_PropertyXYZ; // was: return nameof(PropertyXYZ);
}
}
@jkotas wrote:
It is not possible to use nameof in many situations. For example, when the code is using reflection to access types or methods that it does not have direct access to, like here:
I see line 417 is:
Type type = Type.GetType("System.Security.Principal.GenericPrincipal, System.Security.Claims", throwOnError: true)!;
This could be either manually or automatically transformed to:
[ReflectedNameInfoAttribute("System.Security.Principal.GenericPrincipal", "System.Security.Claims")]
private const/*or static readonly*/ string __fullnameOf_GenericPrincipal = "System.Security.Principal.GenericPrincipal, System.Security.Claims";
Type type = Type.GetType(__fullnameOf_GenericPrincipal, throwOnError: true)!;
In other words, the ReflectedNameInfoAttribute
could be usable both with and without the nameof(x)
function.
Either you manually use the ReflectedNameInfoAttribute
similar to the above, or maybe the C# compiler might recognize special method invocations such as Type.GetType(string)
and automatically generate a backing field with ReflectedNameInfoAttribute
applied.
The solution we are going after with the linker attributes is much more complete solution for this problem.
I'm unsure if you're saying that a solution was already recently invented, but if that's what you mean, then that's great news.
In other words, the ReflectedNameInfoAttribute could be usable both with and without the nameof(x) function.
What benefit does it have over the reflection flow attributes (https://github.com/mono/linker/blob/a86658f4602e616d71e5d6db23a30940534132e2/docs/design/reflection-flow.md)?
maybe the C# compiler might recognize special method invocations
This would mean teaching the whole stack about this feature. Features like that have to be very high value for it to be worth it. I do not see the very high value here.
I don't see how anything currently described in the "reflection-flow.md" solves the nameof
problem, but possibly that's because I don't understand some particular part of it or I missed something in it.
A neat syntax idea just occurred to me. What if the C# compiler allowed you to write a string with quotation marks inside of nameof(x)
like follows?
Type t = System.Type.GetType(nameof("System.Security.Principal.GenericPrincipal"));
And then the compiler would transform the above to use an attribute-annotated compiler-generated backing field, something like this:
[ReflectedNameInfoAttribute("System.Security.Principal.GenericPrincipal")]
private const/*or static readonly*/ string __nameOf1234 = "System.Security.Principal.GenericPrincipal";
Type t = System.Type.GetType(__nameOf1234);
Thus obviously people would and should still try to use nameof(x)
without quotation marks as much as possible, but for the situations where such direct access is impossible or bad, then the compiler could allow you to write it as a string with quotation marks: nameof("x")
whenever nameof(x)
is impossible. The string would be immediately recognizable to ILLinker and other tools thanks to the compiler-generated attribute applied to the backing field.
Alternatively, if people want to go further with this idea, another possibility is: Don't use quotation marks inside nameof
; don't change any C# syntax; instead give Visual Studio an ability to add a "soft reference" to a .csproj !
If you added the Assembly "System.Security.Claims.dll" to a .csproj as a "soft reference", then you'd be able to use nameof(System.Security.Principal.GenericPrincipal)
with the normal syntax, but the C# compiler would convert it to a string because it's only a "soft reference" to the DLL/Assembly.
The compiler would still perform compile-time checking, thus if you make a typo, frex nameof(System.Security.Principal.GenericPrincpl)
then the compiler would generate an error saying that "GenericPrincpl" doesn't exist. The compiler can do this because it has access to read the Assembly at compile-time. However the output IL would not contain any true/hard/normal reference to the DLL, rather it would contain an attribute-annotated string, when in Visual Studio you only add the Assembly "System.Security.Claims.dll" as a "soft reference" or a "dynamic reference" or a "weak reference" or whatever you want to name the new feature.
It is not possible to use nameof in many situations. For example, when the code is using reflection to access types or methods that it does not have direct access to, like here:
How about changing that code to use the extern keyword? Currently that code is written like this....
IPrincipal Example()
{
System.Type t = System.Type.GetType("System.Security.Principal.GenericPrincipal, System.Security.Claims");
System.Reflection.MethodInfo mi = t.GetMethod("GetDefaultInstance");
return (IPrincipal) mi.Invoke(null, null);
}
How about changing it to use extern
like this:
[System.Runtime.InteropServices.DllImport("System.Security.Claims.dll", EntryPoint ="System.Security.Principal.GenericPrincipal.GetDefaultInstance",
CallingConvention = CallingConvention.Clr, WeakLink = true)]
private static extern IPrincipal GetDefaultInstance();
IPrincipal Example()
{
return GetDefaultInstance();
}
Ofcourse currently that doesn't work, but it could. The above WeakLink
property of DllImportAttribute
doesn't exist, but it could. Likewise the above member CallingConvention.Clr
of the enum CallingConvention
doesn't exist, but it could, thus DllImportAttribute
or some other new attribute could be used for weak-linking/importing C# methods into C# Assemblies, instead of the current problematic way of passing a string to System.Type.GetType(string)
.
Currently the preexisting extern
keyword already provides a much nicer syntax (the normal syntax) for passing parameters to external methods, much nicer than the comparatively cumbersome way of constructing an object array to pass to MethodInfo.Invoke(object, object[])
. So why not use extern
for importing both C#/CLR and C++ methods, not only C++ methods? It looks like it may be feasible to extend extern
and DllImportAttribute
to support importing CLR methods from IL DLL's, not only native DLL's.
What does everyone think of the idea of tweaking the JIT and AOT to allow the
ldsfld
instruction to load some fields marked withstatic literal
? "Some fields" meaningldsfld
would only support astatic literal
field if it has at least one attribute applied to the field. For example:Removing this limitation with
ldsfld
would be helpful for:You can see in my above example that I marked
kExcludedFromObfuscation
as being excluded from obfuscation, which would be necessary because its value ("PropertyXYZ") is coincidentally identical to a member name. Smart/automatic obfuscators and AOT-compilers can get confused and make incorrect transformations when a constant string is coincidentally identical to a member name but in reality is unrelated to the member and should not be transformed or renamed.Unfortunately the above usage of
Obfuscation(Exclude = true)
fails to have any effect, because the compiler loads the constant value usingldstr
instead ofldsfld
, becauseldsfld
doesn't allowstatic literal
fields. Currently the above-shownUseConstant()
method is compiled to:instead of:
Thus tools such as IL obfuscators, IL decompilers, and IL-to-x86-64 AOT-compilers are unable to see that the
UseConstant()
method references the constant field namedkExcludedFromObfuscation
, and thus unable to use any attributes (such as ObfuscationAttribute) that were applied to the field. This problem would be solved if the JIT/AOT allowedldsfld
to load astatic literal
field when the field has at least one attribute applied to the field.It's also useful in the opposite case: It would be helpful to use an attribute to explicitly mark strings produced by the C#
nameof
operator, such as:.NET 5.x could be given a new attribute named perhaps
FieldMutabilityAttribute
, and correspondingenum FieldMutability
, like follows:Even before Roslyn takes advantage of
ldsfld
lifting the restriction onstatic literal
, people could start using the new attribute immediately by using C#static readonly
keywords instead ofconst
for now. Currently Roslyn accepts this code and generates aldsfld
instruction:JIT's and AOT's could optimize/transform the
static readonly
field to a constant value whenever they see that theFieldMutabilityAttribute
is applied to thestatic readonly
field and indicates that the field is a constant value.Thus the IL inside a .exe or .dll file on-disk would use
ldsfld
, but at runtime theldsfld
would be compiled to the equivalent ofldc
orldstr
when thestatic readonly/initonly
field is marked with aFieldMutabilityAttribute
that indicates that the value is a constant.