Static registrar with NativeAOT design – discussion

ivanpovazan commented 1 year ago

Description

Microsoft.macOS and Microsoft.iOS enable Objective-C runtime to create instances of C# classes through a type registration system. The type registration can be static – used for device builds and dynamic – used for emulators. At build time, the static registration inspects the assemblies used by the application through a custom linker step. It determines the classes and methods to register with Objective-C and generates a map, which is embedded into the binary. At the application startup, the map is registered with the Objective-C runtime (source).

However, to resolve addresses of registered types, type (and module) metadata tokens are used, which are not available in NativeAOT representations of managed types. This limitation prevents using NativeAOT for applications built on top of Microsoft.macOS and Microsoft.iOS.

This issue has been opened for discussion, possible approaches, ideas and suggestions on how to get pass this limitation with a goal of enabling NativeAOT to work with Xamarin.

/cc: @rolfbjarne

PS I would also like to give credit to @AustinWise who also reported this limitation in: https://github.com/dotnet/runtime/issues/77472

ghost commented 1 year ago

Tagging subscribers to 'os-ios': @steveisok, @akoeplinger See info in area-owners.md if you want to be subscribed.

Issue Details

## Description Microsoft.macOS and Microsoft.iOS enable Objective-C runtime to create instances of C# classes through a type registration system. The type registration can be static – used for device builds and dynamic – used for emulators. At build time, the static registration inspects the assemblies used by the application through a custom linker step. It determines the classes and methods to register with Objective-C and generates a map, which is embedded into the binary. At the application startup, the map is registered with the Objective-C runtime ([source](https://learn.microsoft.com/en-us/xamarin/ios/internals/registrar)). However, to resolve addresses of registered types, [type (and module) metadata tokens](https://github.com/xamarin/xamarin-macios/blob/main/src/ObjCRuntime/Class.cs#L257-L258) are used, which are not available in NativeAOT representations of managed types. This limitation prevents using NativeAOT for applications built on top of Microsoft.macOS and Microsoft.iOS. This issue has been opened for discussion, possible approaches, ideas and suggestions on how to get pass this limitation with a goal of enabling NativeAOT to work with Xamarin. /cc: @rolfbjarne --- PS I would also like to give credit to @AustinWise who reported this limitation in: https://github.com/dotnet/runtime/issues/77472

Author:	ivanpovazan
Assignees:	-
Labels:	`design-discussion`, `os-ios`, `area-NativeAOT-coreclr`
Milestone:	8.0.0

ghost commented 1 year ago

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas See info in area-owners.md if you want to be subscribed.

Issue Details

## Description Microsoft.macOS and Microsoft.iOS enable Objective-C runtime to create instances of C# classes through a type registration system. The type registration can be static – used for device builds and dynamic – used for emulators. At build time, the static registration inspects the assemblies used by the application through a custom linker step. It determines the classes and methods to register with Objective-C and generates a map, which is embedded into the binary. At the application startup, the map is registered with the Objective-C runtime ([source](https://learn.microsoft.com/en-us/xamarin/ios/internals/registrar)). However, to resolve addresses of registered types, [type (and module) metadata tokens](https://github.com/xamarin/xamarin-macios/blob/main/src/ObjCRuntime/Class.cs#L257-L258) are used, which are not available in NativeAOT representations of managed types. This limitation prevents using NativeAOT for applications built on top of Microsoft.macOS and Microsoft.iOS. This issue has been opened for discussion, possible approaches, ideas and suggestions on how to get pass this limitation with a goal of enabling NativeAOT to work with Xamarin. /cc: @rolfbjarne --- PS I would also like to give credit to @AustinWise who reported this limitation in: https://github.com/dotnet/runtime/issues/77472

Author:	ivanpovazan
Assignees:	-
Labels:	`design-discussion`, `os-ios`, `area-NativeAOT-coreclr`
Milestone:	8.0.0

ivanpovazan commented 1 year ago

After some internal discussions with @rolfbjarne and @MichalStrehovsky one idea to tackle this limitation that came up, was to expose all managed methods with UnmanagedCallersOnly attribute and change the Objective-C code to invoke managed methods symbolically. However, this approach also raises other questions e.g., how to handle non-blittable parameter types, etc.

jkotas commented 1 year ago

Resolving tokens or looking up a lot of types or methods by name during startup is wasted time. It would be best to auto-generate C# code with embedded fully resolved references to the types and methods, and compile this code into the app.

However, this approach also raises other questions e.g., how to handle non-blittable parameter types, etc.

How are non-blittable parameters handled today?

filipnavara commented 1 year ago

How are non-blittable parameters handled today?

Simplified version:

A custom linker step generates the Objective-C version of the object. For every member a code is generated here. That contains part of the marshalling. The generated code calls mono_runtime_invoke which in turn calls xamarin_bridge_runtime_invoke_method and eventually bridges to this managed code which does the rest of the marshalling on the managed side. It has pretty high overhead for simple methods (eg. returning int/bool and taking no parameters) so it would be nice to improve at least that part of the design.

rolfbjarne commented 1 year ago

Resolving tokens or looking up a lot of types or methods by name during startup is wasted time.

I agree (this is one of the big reasons the static registrar exists in the first place).

It would be best to auto-generate C# code with embedded fully resolved references to the types and methods, and compile this code into the app.

I will try to look into this. There are however a few points:

Some care will have to be taken to not bloat the app. One example I saw many years ago was an app with many generated classes and properties, and the static registrar had to generate code for 15.000+ properties. The total generated code can become quite big if each property is bigger than it needs to be.
Interaction with the trimmer. Right now we run the static registrar after the trimmer has done its job (and after removing all the API it trimmed). Generating managed code in the static registrar means we'll have to run the static registrar a bit earlier (after marking API, but before removing non-marked API), because we might need some of the API in the generated code (iow the static registrar might have to tell the trimmer to keep API it would otherwise remove). This also influences the trimming design: as I understand it, the NativeAOT compiler does its own trimming (which was partially disabled for the initial proof-of-concept, and we stilled executed ILLinker), and one idea floating around was to figure out how to only use NativeAOT's trimming. I believe using NativeAOT's trimming if the static registrar generates managed code will be a harder problem to solve, because the static registrar needs the trimmed output (it's really unnecessary to generate code for API the trimmer will remove), while at the same time generating new managed code.

jkotas commented 1 year ago

NativeAOT compiler does its own trimming (which was partially disabled for the initial proof-of-concept, and we stilled executed ILLinker), and one idea floating around was to figure out how to only use NativeAOT's trimming.

Can we run the trimmer in analysis-only mode where it only figures out the roots for the static registrar to use without actually rewriting the outputs?

rolfbjarne commented 1 year ago

NativeAOT compiler does its own trimming (which was partially disabled for the initial proof-of-concept, and we stilled executed ILLinker), and one idea floating around was to figure out how to only use NativeAOT's trimming.

Can we run the trimmer in analysis-only mode where it only figures out the roots for the static registrar to use without actually rewriting the outputs?

I guess we could do that, or alternatively have the trimmer write its output, and then use it as input to the static registrar, but pass the original non-trimmed assemblies to the NativeAOT compiler.

agocke commented 1 year ago

Can we run the trimmer in analysis-only mode where it only figures out the roots for the static registrar to use without actually rewriting the outputs?

This feels a lot like some of the similar problems we have with custom steps, where they are mainly used as a pre-scanning mechanism. It would be nice to fold some of this into a separate tool. That would give some other benefits for the linker engineering as we wouldn't have to support as many features.

MichalStrehovsky commented 1 year ago

I'm still in the process of trying to understand this, so bear with me.

Do we expect the UnamangedCallersOnly approach to have corresponding objective-c code to be generated similar to this: https://github.com/xamarin/xamarin-macios/blob/673cf3688622028ff9e390d4e58fbbc8ef06f3bf/tools/common/StaticRegistrar.cs#L3283-L4353

Can we avoid having to do the glue code and build the necessary structures/interfaces in C# directly? How well-defined is the obj-c ABI? I'm trying to map this to how we do COM/WinRT interop - those can get by without generating the extra native code, so I'm trying to understand where this is different.

rolfbjarne commented 1 year ago

Can we avoid having to do the glue code and build the necessary structures/interfaces in C# directly?

It's possible to create all the Objective-C data structures dynamically at runtime (using the Objetive-C Runtime API. This is not a good solution though, because of performance reasons (it's slow, and we'll have to create all the data structures at startup, so we'd be taking a pretty big startup hit - a few years ago it was ~2s for a macOS application on a performant desktop, on a slower mobile device it'll be much more).

How well-defined is the obj-c ABI?

It's well defined, but it's also a moving target (Apple adds to the ABI sometimes). They also make performance improvements often.

I'm trying to map this to how we do COM/WinRT interop - those can get by without generating the extra native code, so I'm trying to understand where this is different.

As mentioned above we can get by without generating the extra native code too, but it's slow.

The main problem is that for a number of C# classes, an equivalent Objective-C class must exist, and they must all exist before the app can launch, because we don't known which ones will be needed (because Objective-C classes can be referenced dynamically, and this is very often done in storyboards - UI described in XML and often loaded at startup).

So the problem becomes how to create Objective-C classes efficiently, and the answer is to do what Apple/Xcode does: write Objective-C code, and then the required data structures will all be written to disk and loaded as needed at runtime (and very efficiently, since Apple has optimized this quite heavily in the past, and even continues to optimize it pretty much every year).

Another problem is that we really want to minimize our (dirty) memory footprint, because in some cases our limitations are quite strict. Cretaing all the Objective-C data structures at runtime uses a significant amount of dirty memory compared to using constant memory.

My current plan is to write something like this:

class AppDelegate : NSObject, IUIApplicationDelegate {
    // this method is written by the app developer
    public override bool FinishedLaunching (UIApplication app, NSDictionary options)
    {
        // ...
    }

    // the following method is generated/injected by the static registrar for the method above
    [UnmanagedCallersOnly (EntryPoint = "__registrar__uiapplicationdelegate_didFinishLaunching")]
    static byte __registrar__DidFinishLaunchingWithOptions (IntPtr handle, IntPtr selector, IntPtr p0, IntPtr p1)
    {
        var obj = Runtime.GetNSObject (handle);
        var p0Obj = Runtime.GetNSObject (p0);
        var p1Obj = Runtime.GetNSObject (p1);
        return obj.DidFinishLaunchingWithOptions (p0Obj, p1Obj);
    }
}

extern BOOL __registrar__uiapplicationdelegate_init (AppDelegate self, SEL _cmd, UIApplication* p0, NSDictionary* p1);

@interface AppDelegate : NSObject<UIApplicationDelegate, UIApplicationDelegate> {
}
    -(BOOL) application:(UIApplication *)p0 didFinishLaunchingWithOptions:(NSDictionary *)p1;
@end
@implementation AppDelegate {
}
    -(BOOL) application:(UIApplication *)p0 didFinishLaunchingWithOptions:(NSDictionary *)p1
    {
        return __registrar__uiapplicationdelegate_didFinishLaunching (self, _cmd, p0, p1);
    }
@end

MichalStrehovsky commented 1 year ago

It's possible to create all the Objective-C data structures dynamically at runtime (using the Objetive-C Runtime API. This is not a good solution though, because of performance reasons

I was thinking more in the sense of whatever data structures the objective-c compiler places in the generated object file, the managed compiler could also generate and put into it's own output, at least in theory (it's all object files in the end). That would fall apart if those structures are not well defined though.

rolfbjarne commented 1 year ago

It's possible to create all the Objective-C data structures dynamically at runtime (using the Objetive-C Runtime API. This is not a good solution though, because of performance reasons

I was thinking more in the sense of whatever data structures the objective-c compiler places in the generated object file, the managed compiler could also generate and put into it's own output, at least in theory (it's all object files in the end). That would fall apart if those structures are not well defined though.

Yes, we could potentially write an object file directly instead of going through Objectice-C code and compile that.

I think that would work, but it would also likely require some digging into the file format because it's not very well documented afaik (although clang is open source so it's not really hidden either).

filipnavara commented 1 year ago

I was thinking more in the sense of whatever data structures the objective-c compiler places in the generated object file, the managed compiler could also generate and put into it's own output, at least in theory (it's all object files in the end). That would fall apart if those structures are not well defined though.

I was looking into that in the past. It's certainly possible but non-trivial. The documentation is sparse. The data are stored in special sections in the object file.

If the Objective-C code was not doing any part of the fancy marshalling then perhaps it would be feasible. It would not necessarily be easier than just generating ObjC file, compiling it, and passing as another input to the linker.

rolfbjarne commented 1 year ago

My current plan is as follows:

Say we have a managed class that subclasses NSObject, and exports a method:

public partial class MyObject : NSObject {
    [Export ("doSomething:")]
    public void DoSomething (int abc)
    {
    }
}

We will generate the following wrapper code:

public partial class MyObject {
    [UnmanagedCallersOnly (EntryPoint = "__MyObject___DoSomething__")]
    static void __DoSomething__ (IntPtr handle, IntPtr sel, int abc)
    {
        var obj = (MyObject) Runtime.GetNSObject (handle);
        obj.DoSomething (abc);
        // process any other arguments to the managed method
    }
}

And the following Objective-C class:

@interface MyObject : NSObject {
}
    -(void) doSomething: (int) abc;
@end

@implementation AppDelegate {
}
    -(void) doSomething: (int) abc
    {
        __MyObject___DoSomething__ (self, _cmd, abc);
    }
@end

Note 1: the generated code isn't exactly as shown here, because there are many corner-cases that have to be handled, but this is the general idea.

Note 2: the above code should work when there's an AOT compiler, but it won't when we're using the JIT, because the native symbol __MyObject___DoSomething__ won't exist at build time. In that case, we'll generate a lookup mechanism, something like this:

@interface MyObject : NSObject {
}
    -(void) doSomething;
@end

@implementation AppDelegate {
}

    typedef id (*__MyObject___DoSomething__func) (id self, SEL sel);
    -(void) doSomething
    {
        static __MyObject___DoSomething__func __MyObject___DoSomething__;
        xamarin_lookup_unmanagedcallersonly ((void **) &__MyObject___DoSomething__, "MyAssembly", "__MyObject___DoSomething__");
        __MyObject___DoSomething__ (self, _cmd);
    }
@end

where the xamarin_lookup_unmanagedcallersonly function will look for the UnmanagedCallersOnly trampoline when the function is first called.

ivanpovazan commented 1 year ago

/cc: @simonrozsival

simonrozsival commented 1 year ago

@rolfbjarne would generate the code using a roslyn source generator?

MichalStrehovsky commented 1 year ago

I was looking into that in the past. It's certainly possible but non-trivial. The documentation is sparse. The data are stored in special sections in the object file.

Just a random thought - not sure if it's feasible. I see that objc supports __attribute__((weak)) on some things. Could we place that on a method? Could we make the objc-generated method body a weak symbol and generate a UnmanagedCallersOnly method with the exact same mangled name and signature? We'd leave generating the objc data structures to the objc compiler, but provide our own method bodies and avoid the size/perf impact of the thunk.

I don't know how much effort is it worth to put into it - how many methods do we need to expose in an average app, and how costly is the objc method that just thunks to our managed implementation (looking at Rolf's example, maybe the implementation ends up being just a tail call, and then it's cheap - as opposed to something that needs to build a call frame).

My current plan is as follows:

This looks good - I have a couple questions:

How will exceptions be handled? Do we need a try/catch around the obj.DoSomething (abc);? Managed exceptions leaking though the UnmanagedCallersOnly boundary would be a failfast.
We could also potentially replace the cast in var obj = (MyObject) Runtime.GetNSObject (handle); with Unsafe.As depending on whether we assume the obj-c side to already be type safe. A cast costs about a dozen bytes, plus a small throughput cost.

AustinWise commented 1 year ago

How will exceptions be handled? Do we need a try/catch around the obj.DoSomething (abc);? Managed exceptions leaking though the UnmanagedCallersOnly boundary would be a failfast.

If an exception tries to exit a UnmanagedCallersOnly function, there should be a ObjectiveCMarshal.UnhandledExceptionPropagationHandler installed. This will translate managed exceptions into a native NSException.

rolfbjarne commented 1 year ago

@rolfbjarne would generate the code using a roslyn source generator?

No, it's generated in a custom linker step when the trimmer runs, so it's too late to run any source generators (iow we use Cecil to generate the IL directly).

We could also potentially replace the cast in var obj = (MyObject) Runtime.GetNSObject (handle); with Unsafe.As depending on whether we assume the obj-c side to already be type safe.

Objective-C is not type-safe. In particular, Objective-C is known to blatantly lie about their types (for instance: their headers says an API returns type X, but it returns type Y, that only quacks like X, but isn't X at all).

A cast costs about a dozen bytes, plus a small throughput cost.

We're already doing a dictionary lookup (IntPtr -> object), so a cast is really a minor cost in the whole process.

I was looking into that in the past. It's certainly possible but non-trivial. The documentation is sparse. The data are stored in special sections in the object file.

Just a random thought - not sure if it's feasible. I see that objc supports __attribute__((weak)) on some things. Could we place that on a method? Could we make the objc-generated method body a weak symbol and generate a UnmanagedCallersOnly method with the exact same mangled name and signature? We'd leave generating the objc data structures to the objc compiler, but provide our own method bodies and avoid the size/perf impact of the thunk.

That's certainly an intriguing idea, but __attribute__((weak)) doesn't seem to work:

@interface MyObject : NSObject {
}
-(void) myFunc __attribute__((weak));
@end

results in:

test.m:9:31: warning: 'weak' attribute only applies to variables, functions, and classes [-Wignored-attributes] -(void) myFunc attribute((weak));

However, it might be possible to just not add the Objective-C implementation:

@interface MyObject : NSObject { }
-(void) myFunc;
@end

@implementation MyObject { }
// nothing here!
@end

and that compiles just fine, albeit with a warning:

test.m:12:17: warning: method definition for 'myFunc' not found

I wasn't able to figure out how to easily write a function named -[MyObject myFunc] in C/assembly though, so I'm not sure if it would work at runtime.

ivanpovazan commented 1 year ago

No, it's generated in a custom linker step when the trimmer runs, so it's too late to run any source generators (iow we use Cecil to generate the IL directly).

Even though it is a separate topic, we should also think about how ILLinker and NativeAOT would work together - be compatible, or how NativeAOT would implement this (and other custom linker steps), as it seems there is an unavoidable dependency between Xamarin and trimming phase.

marek-safar commented 1 year ago

We will generate the following wrapper code:

public partial class MyObject {
[UnmanagedCallersOnly (EntryPoint = "__MyObject___DoSomething__")]
static void __DoSomething__ (IntPtr handle, IntPtr sel, int abc)
{
var obj = (MyObject) Runtime.GetNSObject (handle);
obj.DoSomething (abc);
// process any other arguments to the managed method
}
}

Do we need this for every callable method or could this be generated per method signature only and use some kind of aggresive inlining inside NativeAOT and get rid of this frame completely at runtime?

rolfbjarne commented 1 year ago

We will generate the following wrapper code:
public partial class MyObject {
  [UnmanagedCallersOnly (EntryPoint = "__MyObject___DoSomething__")]
  static void __DoSomething__ (IntPtr handle, IntPtr sel, int abc)
  {
      var obj = (MyObject) Runtime.GetNSObject (handle);
      obj.DoSomething (abc);
      // process any other arguments to the managed method
  }
}
Do we need this for every callable method or could this be generated per method signature only and use some kind of aggresive inlining inside NativeAOT and get rid of this frame completely at runtime?

There are at least two issues:

The number of parameters will always be different, because the second argument (sel) will never be present in the target method. This means we'll always need to shuffle parameters around in the stack frame.
The problem with generating code per method signature is that you still need to know which method to call, and that can't be per method signature. You might do something like this, but I don't see way to pass the actual method to invoke to the intermediate trampoline:

[UnmanagedCallersOnly (EntryPoint = "__MyObject___DoSomething__")]
static void __DoSomething__ (IntPtr handle, IntPtr sel, int abc)
{
    __Signature_void_int32 (handle, sel, abc, &MyObject.DoSomething);
}
[UnmanagedCallersOnly (EntryPoint = "__MyObject___DoSomethingElse__")]
static void __DoSomethingElse__ (IntPtr handle, IntPtr sel, int abc)
{
    __Signature_void_int32 (handle, sel, abc, &MyObject.DoSomethingElse);
}
static void __Signature_void_int32 (IntPtr handle, IntPtr sel, int abc, ? method)
{
    var obj = (MyObject) Runtime.GetNSObject (handle);
    obj.(*method) (abc); // what would the type of 'method' be, and how would this be implemented in IL?
}

marek-safar commented 1 year ago

Maybe something like this could work

// delegate* assumes we can hardcode calling convention
static void __Signature_void_int32 (IntPtr handle, IntPtr sel, int abc, delegate*<object, void, int> method)
{
    var obj = (MyObject) Runtime.GetNSObject (handle);
    method (obj, abc);
}

vitek-karas commented 1 year ago

@AaronRobinsonMSFT FYI

MichalStrehovsky commented 1 year ago

Maybe something like this could work

If we need the casting to guard type safety (which I assume we do based on the previous comment about Unsafe.As), I think it would have the be per signature+per type of this and that would limit the savings.

Even though it is a separate topic, we should also think about how ILLinker and NativeAOT would work together - be compatible, or how NativeAOT would implement this (and other custom linker steps), as it seems there is an unavoidable dependency between Xamarin and trimming phase.

What decides whether we need to generate the UnmanagedCallersOnly method __DoSomething__? Is it based on the presence of DoSomething after trimming? And also what decides we need to generate the obj-c wrapper? Is it based on the presence of the UnmanagedCallersOnly method?

rolfbjarne commented 1 year ago

Even though it is a separate topic, we should also think about how ILLinker and NativeAOT would work together - be compatible, or how NativeAOT would implement this (and other custom linker steps), as it seems there is an unavoidable dependency between Xamarin and trimming phase.

What decides whether we need to generate the UnmanagedCallersOnly method __DoSomething__? Is it based on the presence of DoSomething after trimming? And also what decides we need to generate the obj-c wrapper? Is it based on the presence of the UnmanagedCallersOnly method?

We need the Objective-C wrapper for methods in types subclassing Foundation.NSObject, and either:

The method has an [Export ("selector")] attribute
The method overrides a method with an [Export ("selector")] attribute:
The method implements a method with an Export attribute from an interface with a Protocol attribute.

[Protocol ("MyProtocol")]
interface IMyProtocol {
    [Export ("myProtocolMethod")]
    void MyProtocolMethod ();
}

class MyBaseObject : NSObject {
    [Export ("myBaseMethod")]
    protected virtual void MyBaseMethod () {} // this method gets an Objective-C wrapper method
}

class MyObject : MyBaseObject, IMyProtocol {
    [Export ("myMethod:")]
    void MyMethod () {} // this method gets an Objective-C wrapper method

    public void MyProtocolMethod () {} // this method gets an Objective-C wrapper method

    protected override void MyBaseMethod () {}  // this method gets an Objective-C wrapper method
}

There are a couple of other scenarios as well, and numerous corner cases, but the general rule is that we need an Objective-C wrapper for any method with an Export attribute (directly or indirectly).

Then the UnmanagedCallersOnly method is needed whenever we have an Objective-C wrapper.

MichalStrehovsky commented 1 year ago

We need the Objective-C wrapper for methods in types subclassing Foundation.NSObject

Is this "types subclassing Foundation.NSObject that survived trimming", or are these rooted/never trimmed? How about the methods on these types - are those rooted or do we just need those methods that survived trimming? And if we allow trimming, how does it work if we were to trim all of IMyProtocol, but keep MyObject.MyProtocolMethod - would that method need a wrapper?

rolfbjarne commented 1 year ago

We need the Objective-C wrapper for methods in types subclassing Foundation.NSObject

Is this "types subclassing Foundation.NSObject that survived trimming", or are these rooted/never trimmed?

We root all subclasses from Foundation.NSObject.

How about the methods on these types - are those rooted or do we just need those methods that survived trimming?

Same: we root all methods with an Export attribute (directly or indirectly as above).

And if we allow trimming, how does it work if we were to trim all of IMyProtocol, but keep MyObject.MyProtocolMethod - would that method need a wrapper?

This is one of the reasons moving the registrar out of the custom linker steps is difficult: yes, MyObject.MyProtocolMethod still needs a wrapper even if IMyProtocol is trimmed away. We used to just root the interface to get around this, but what we do now is to store the interface in memory (using a custom linker step), so that the registrar later can figure out that MyObject.MyProtocolMethod comes from such an interface, even if the interface will be trimmed away.

vitek-karas commented 1 year ago

Somewhat unconventional opinion:

Trimming tools (ILLink/NativeAOT/Analyzers) already hardcode deeper understanding of our C ABI interop (PInvoke)
The tools also hardcode a simplified version of understanding COM Interop (which is technically Windows only)
If other platforms have a specific interop technology which is very common, like objective-C interop on iOS, I think it would make sense to hardcode knowledge of it into the trimming tools. It's part of our overall interop story (we try to make the complexities invisible from users) - @AaronRobinsonMSFT for confirmation.

So if we can come up with a simple set of rules all trimming tools should follow around objective-C interop, personally I would be fine hardcoding these into the trimming tools (along with tests and everything). That said, ideally these behaviors should be runtime independent. That part might be problematic to achieve, so we may need to do some compromises. For example, if we think that object-C interop should be inherently supported on iOS targets, then we should hardcode all of it into the NativeAOT compiler (when it targets iOS).

AaronRobinsonMSFT commented 1 year ago

It's part of our overall interop story (we try to make the complexities invisible from users) - @AaronRobinsonMSFT for confirmation.

Yes, this has been the case when we source generate something. The Trimmer specifically adds a decent amount of spooky behavior that can steal hours of your life, at least when working in the runtime. For users this is likely less of a concern, but enriching the Trimmer to do the right thing 90 % of the time is a reasonable path forward in my opinion. An alternative to pushing it into the Trimmer itself is to encode it in an analyzer that will warn/error. I prefer the tooling to error out however the C# UX generally prefers analyzers or something that runs at Design time to warn/error early.

MichalStrehovsky commented 1 year ago

We root all subclasses from Foundation.NSObject.

If I'm looking at the right code, this is more subtle:

If a subclass of NSObject is reachable:
- And the type is not in Xamarin.iOS or Xamarin.Forms.Platform.iOS or a special product assembly, preserve the type completely (makes me wonder why?)
- If the type is in one of the special assemblies, preserve an IntPtr constructor and all methods marked Export.

If other platforms have a specific interop technology which is very common, like objective-C interop on iOS, I think it would make sense to hardcode knowledge of it into the trimming tools

Based on looking at https://github.com/xamarin/xamarin-macios/tree/b0c94b48a656b2b809467a87d8f2464a122ce2a7/tools/linker and around, I would prefer not to do such hardcoding, especially if we were required to duplicate the logic due to it being written in Cecil. The interop rules we hardcode for p/invokes and COM are well defined - they're actually public API contracts - they don't change. Looking through what the macios steps do - it's the opposite - it's special case after special case for internal implementation details of the macios interop library, and for various types in Apple SDK. It's a moving target with two free degrees of movement. This is more like WinRT++ than p/invokes.

We could express some of these relationships with DynamicDependency, and it would work for ILLink, but DynamicDependency is going to keep a lot more stuff than needed on the NativeAOT side (NativeAOT considers these "reflection used", which means it will generate a lot more data structures to support reflection with these - eliminating a lot of optimization opportunities).

I think running IL Linker with these steps and leaving breadcrumbs (with yet another custom step) as to what to keep when NativeAOT does its own trimming is a fine plan for .NET 8 or beyond. We can figure out the mechanisms to leave the breadcrumbs as the need arises.

simonrozsival commented 1 year ago

@MichalStrehovsky those custom linker steps do several different (independent) things

they add custom marking logic (NSObject subclasses, support for [Preserve] attributes, etc.)
they do some IL-level optimizations (those seem to be Mono specific and I think there shouldn't be a need to migrate them into nativeaot)
they analyze dependencies on static and dynamic libraries based on [assembly: LinkWith(...)] attributes the types used in the code
they generate a static registrar

I agree that the custom marking logic is very specific and I don't think it should be hardcoded into the nativeaot compiler. On the other hand, that part that generates the static registrar with those [UnmanagedCallersOnly] trampolines and the Objective-C code that @rolfbjarne proposed in this thread would IMO fit somewhere in the nativeaot pipeline itself. Somewhere between when the dependency graph is built and before it starts generating native code.

rolfbjarne commented 1 year ago

All of our logic in macios is quite complex and full of corner cases, so I don't think it would be a good idea to have it anywhere else.

Going forward, if our desire is to remove the custom linker steps, I believe this is (a very high-level view of) our best approach:

Run a pre-trimmer pass over all the assemblies, where we:
- Inject/remove DynamicDependency attributes according to what we know is safe to for the trimmer to remove / want the trimmer to keep.
- Generate any managed static registrar code (as described in this issue).
- (other tasks done currently done in custom linker steps)
Run the linker or NativeAOT compiler.
Run a post-trimmer pass over all the assemblies, where we:
- Generate any native code for the static registrar.
- Do anything else we need to do with the trimmed assemblies (such as collect all the P/Invokes, etc.).
- This pass would likely have to take the pre-trimmed assemblies as input as well.
- (other tasks done currently done in custom linker steps)

I believe this requires a couple of things from the NativeAOT compiler:

It should understand the DynamicDependency attributes the same way that the trimmer does.
It should output a static library (and not an executable), because we have to link in more native code that we can only create after the NativeAOT compiler has executed.

The main downside I see of this approach is that it'll slow down the build:

We're loading the non-trimmed assemblies twice more in memory.
We're loading the trimmed assemblies once more.
We have to generate a lot of managed code for the static registrar that will be linked away, since we have to add it before the trimmer runs, as opposed to during/after (when we know what's marked and what's not).

One major upside is that it should improve the testability of our code (it's not trivial to run the custom linker steps outside of an actual build).

jkotas commented 1 year ago

Going forward, if our desire is to remove the custom linker steps, I believe this is (a very high-level view of) our best approach

This looks a lot more complex than what we have discussed in https://github.com/dotnet/runtime/issues/80912#issuecomment-1400771752 . Would this simpler approach discussed earlier be an option?

MichalStrehovsky commented 1 year ago

It should understand the DynamicDependency attributes the same way that the trimmer does.

We already do that, but I would like to avoid DynamicDependency for size reasons. By default, when NativeAOT compiles a method, we generate three things: the actual code bytes and unwinding information (this is pretty standard stuff that even the C++ compiler generates), and precise GC information (if we decide we want to use conservative stack scanning like Mono does, this can be discarded for a ~5% size saving). That's it. You probably noticed that we don't generate the name of the method, information about the owning type, the parameters to the method, etc. None of that is needed to run the code. If the compiler however finds out this extra information is needed, it will generate it. There are many ways how compiler can "find out" - one of those is DynamicDependency, another is TrimmerRootAssembly, descriptors, etc. Forcing something to be generated as reflection visible is potentially several times more overhead that just generating the method body.

If we're thinking about solutions beyond custom steps, here's what I've been thinking about with source generators:

Let's say user writes what was in https://github.com/dotnet/runtime/issues/80912#issuecomment-1434662940. We generate the UnmanagedCallersOnly method with a source generator and additionally, generate a "UnmanagedDynamicDependency" attribute on the class (or the constructors?) linking the constructors to the generated method. This would ensure that whenever the class is constructed, we generate the UCO wrapper (whether this will be a new attribute, or we say that DynamicDependency pointing to a UCO method with a named Entrypoint doesn't actually signal reflection use is an implementation detail).

The source generator could also generate any additional bookkeeping that is necessary for the static registrar to work (being handwavy here).

I don't have a good sense of what the custom steps rewrite. I saw things like rewriting IntPtr.Zero to 4/8 - those are unnecessary optimizations for NAOT, and IL Linker already does that on it's own too - it's not an optimization specific to macios and shouldn't be done by macios steps. It also rewrites other methods - those could be IL Linker substitutions (that NAOT also supports).

Once native compilation/trimming is done, a separate tool can run that will look at what's left (we'd have plugins that would inspect the results depending on what the result is - native or IL), will look at the original IL, and generate whatever native artifacts are needed to glue things together.

rolfbjarne commented 1 year ago

Going forward, if our desire is to remove the custom linker steps, I believe this is (a very high-level view of) our best approach

This looks a lot more complex than what we have discussed in #80912 (comment) . Would this simpler approach discussed earlier be an option?

I'm sorry I was unclear: my thought is to implement the simple approach for .NET 8, while the more complex is potentially for .NET 9+.

rolfbjarne commented 1 year ago

It should understand the DynamicDependency attributes the same way that the trimmer does.

We already do that, but I would like to avoid DynamicDependency for size reasons. By default, when NativeAOT compiles a method, we generate three things: the actual code bytes and unwinding information (this is pretty standard stuff that even the C++ compiler generates), and precise GC information (if we decide we want to use conservative stack scanning like Mono does, this can be discarded for a ~5% size saving). That's it. You probably noticed that we don't generate the name of the method, information about the owning type, the parameters to the method, etc. None of that is needed to run the code. If the compiler however finds out this extra information is needed, it will generate it. There are many ways how compiler can "find out" - one of those is DynamicDependency, another is TrimmerRootAssembly, descriptors, etc. Forcing something to be generated as reflection visible is potentially several times more overhead that just generating the method body.

If we're thinking about solutions beyond custom steps, here's what I've been thinking about with source generators:

Let's say user writes what was in #80912 (comment). We generate the UnmanagedCallersOnly method with a source generator and additionally, generate a "UnmanagedDynamicDependency" attribute on the class (or the constructors?) linking the constructors to the generated method. This would ensure that whenever the class is constructed, we generate the UCO wrapper (whether this will be a new attribute, or we say that DynamicDependency pointing to a UCO method with a named Entrypoint doesn't actually signal reflection use is an implementation detail).

Exactly how the UCO method is generated doesn't really matter in this discussion (as long as it happens before the NativeAOT compiler runs), it can either be as a custom linker step, a source generator, or using another MSBuild task that executes before the NativeAOT compiler, or something else entirely.

What matters however, is that we need a way to tell whomever does the treeshaking (be it illinker or NativeAOT) what can be trimmed away and what can't, and it would be highly desirable for us to have a single solution that works everywhere. If that's a DynamicDependency attribute, that's fine, if it's an xml descriptor, that's fine too.

Note that we don't only need to root API, we might also need to unroot API - I believe NativeAOT treats UCO methods with an EntryPoint as roots, and the behavior we need is that if another managed method (the one the UCO wrapper calls) survives trimming, then the UCO wrapper must exist, but otherwise it shouldn't.

So for the following example:

public partial class MyObject : NSObject {
    [Export ("doSomething:")]
    public void DoSomething (int abc)
    {
    }

    [UnmanagedCallersOnly (EntryPoint = "__MyObject___DoSomething__")]
    static void __DoSomething__ (IntPtr handle, IntPtr sel, int abc)
    {
        var obj = (MyObject) Runtime.GetNSObject (handle);
        obj.DoSomething (abc);
        // process any other arguments to the managed method
    }
}

The __DoSomething__ method should only survive trimming if and only if DoSomething did.

The source generator could also generate any additional bookkeeping that is necessary for the static registrar to work (being handwavy here).

I don't have a good sense of what the custom steps rewrite. I saw things like rewriting IntPtr.Zero to 4/8 - those are unnecessary optimizations for NAOT, and IL Linker already does that on it's own too - it's not an optimization specific to macios and shouldn't be done by macios steps. It also rewrites other methods - those could be IL Linker substitutions (that NAOT also supports).

The IntPtr.Size optimization wasn't very useful by itself, but it had a cascading effect in this scenario:

if (IntPtr.Size == 8) {
    DoA ();
} else {
    DoB ();
}

where we'd also remove the call to either DoA or DoB, and so on. This was a significant size improvement, because the linker at the time didn't know the target pointer size (and thus couldn't perform this optimization), and while Mono's AOT compiler would inline IntPtr.Size, it would not remove the unused DoX method. Since NativeAOT trims, this particular optimization should be unnecessary.

An example of an optimization I don't think NativeAOT would be able to do is this: https://github.com/rolfbjarne/xamarin-macios/blob/docs-custom-linker-steps/docs/custom-linker-steps/README.md#monotouchtunercoretypemapstep - code is optimized depending on whether a type is subclassed or not.

Once native compilation/trimming is done, a separate tool can run that will look at what's left (we'd have plugins that would inspect the results depending on what the result is - native or IL), will look at the original IL, and generate whatever native artifacts are needed to glue things together.

Yes, we'd need to be able to figure out which API NativeAOT removed and which it didn't (I'm assuming this information would be in some other format that's not IL?)

marek-safar commented 1 year ago

An example of an optimization I don't think NativeAOT would be able to do is this

I think that's one of the optimization that is actually very easy for NAOT (didn't check though if it does today already)

marek-safar commented 1 year ago

The DoSomething method should only survive trimming if and only if DoSomething did.

It'd be very nice if the native linker would remove it but I guess it could not and that's why we have the extra logic.

ivanpovazan commented 1 year ago

The DoSomething method should only survive trimming if and only if DoSomething did.

The following approach could solve this issue:

Use source generators to do the following:

For a class MyObject

public partial class MyObject : NSObject
{
[Export ("doSomething:")]
public void DoSomething (int abc)
{
}

[Export ("doSomethingElse:")]
public void DoSomethingElse (int abc)
{
}
}

Generate:

public partial class MyObject : NSObject
{

[MyUCOWrapper(nameof(__DoSomething_wrapper__))]
[Export ("doSomething:")]
public void DoSomething (int abc)
{
}

[SoftRoot]
[UnmanagedCallersOnly (EntryPoint = "__MyObject___DoSomething__")]
static void __DoSomething_wrapper__ (IntPtr handle, IntPtr sel, int abc)
{
    var obj = (MyObject) Runtime.GetNSObject (handle);
    obj.DoSomething (abc);
    // process any other arguments to the managed method
}

[MyUCOWrapper(nameof(__DoSomethingElse_wrapper__))]
[Export ("doSomethingElse:")]
public void DoSomethingElse (int abc)
{
}

[SoftRoot]
[UnmanagedCallersOnly (EntryPoint = "__MyObject___DoSomethingElse__")]
static void __DoSomethingElse_wrapper__ (IntPtr handle, IntPtr sel, int abc)
{
    var obj = (MyObject) Runtime.GetNSObject (handle);
    obj.DoSomethingElse (abc);
    // process any other arguments to the managed method
}
}

Explanation:

Introduce soft roots for UCO methods residing in types subclassed from NSObject. Soft roots are UCO methods which are used for wrapping methods with the Export attribute and are treated differently than regular UCO methods with EntryPoint defined - hard roots. The reason for this is to have a mechanism to only wrap/export methods with Export attribute that survived the trimming phase. To handle them with ILCompiler and prevent rooting before the dependency analysis starts there would be 2 options:
- Introduce a new internal-use-only SoftRootAttribute for this purpose
- Special-case all UCO methods defined in types subclassed from NSObject (could impose other problems like "regular UCO methods")
  - if we go with this approach the SoftRootAttribute above is unnecessary
Introduce an implicit dependency graph edge (possibly Conditional dependency https://github.com/dotnet/runtime/blob/main/docs/design/coreclr/botr/ilc-architecture.md#dependency-types) for a method with Export attribute and its UCO wrapper (e.g., DoSomething -> __DoSomething_wrapper__) This would make sure that soft roots (UCO wrappers) are not trimmed out. I am not sure what would be the right way to express this relationship other than with yet another attribute (above I used MyUCOWrapperAttribute for simplicity) and a name of the wrapper it references. It would probably also make sense to utilize DynamicallyAccessedMembersAttribute (this is more an implementation detail).

Having said all this, if a user code references MyObject::DoSomething and does not reference MyObject::DoSomethingElse, there should be a relationship Program:Main-[static]->MyObject:DoSomething-(conditional)->MyObject:__DoSomething_wrapper__ which would keep all required methods for our usecase and remove DoSomethingElse and its soft root wrapper __DoSomethingElse_wrapper__

It'd be very nice if the native linker would remove it but I guess it could not and that's why we have the extra logic.

I think we would still have a problem with the managed code referenced from the UCO itself e.g., if the code references some managed type the compiler would have to pack metadata info into internal data structures which will still take up the space but won't be needed during runtime.

rolfbjarne commented 1 year ago

This has been implemented in xamarin-macios now.

dotnet / runtime