dotnet / android

.NET for Android provides open-source bindings of the Android SDK for use with .NET managed languages such as C#
MIT License
1.92k stars 526 forks source link

Lazily initialize JCW delegates #9306

Open jpobst opened 4 days ago

jpobst commented 4 days ago

Note: all performance numbers mentioned are run on Android Emulator on a DevBox, so they are somewhat inflated.

Imagine we have a Java type that contains a lot of virtual methods:

package foo;

public class DelegateTester 
{
  public long ReturnLongValue1 () { return 0; }
  public long ReturnLongValue2 () { return 0; }
  public long ReturnLongValue3 () { return 0; }
  public long ReturnLongValue4 () { return 0; }
  public long ReturnLongValue5 () { return 0; }
  ... to 100

  public long AddValues () {
    return ReturnLongValue1 () + ReturnLongValue2 () + ReturnLongValue3 () + ReturnLongValue4 () + ReturnLongValue5 () ...;
  }
}

Now imagine we bind it, and override all of those methods:

public class MyDelegateTester : MY.Testpackage.DelegateTester
{
    public override long ReturnLongValue1 () => 1;
    public override long ReturnLongValue2 () => 1;
    public override long ReturnLongValue3 () => 1;
    public override long ReturnLongValue4 () => 1;
    public override long ReturnLongValue5 () => 1;
   ... to 100
}

Then we create an instance of this object in C#: new MyDelegateTester().

For a Release config with marshal methods turned off, this object creation takes ~694ms. (As an aside, this case really shines with marshal methods.)

Scenario First Object Creation Second Object Creation First AddValues Invoke Second AddValues Invoke
Marshal Methods Off 694ms 0ms 15ms 0ms
Marshal Methods On 55ms 0ms 18ms 0ms

The issue is that we call mono.android.Runtime.register from the static constructor of the generated Java peer type, and this calls the C# generated Get*Handler immediately for every overridden method (in this case 100) even if they aren't immediately used. If we could make these calls lazy instead it would likely improve app startup and navigating to a new activity, since these are places where objects are often created but are not used until later.

Ideally, this could be done in the Java(?) layer as it could then improve all existing bindings. However it could also be done at the binding level.

Instead of:

static Delegate? cb_ReturnLongValue1_ReturnLongValue1_J;

static Delegate GetReturnLongValue1Handler ()
{
  if (cb_ReturnLongValue1_ReturnLongValue1_J == null)
    cb_ReturnLongValue1_ReturnLongValue1_J = JNINativeWrapper.CreateDelegate (new _JniMarshal_PP_J (n_ReturnLongValue1));
  return cb_ReturnLongValue1_ReturnLongValue1_J;
}

We could add an extra level of indirection:

static _JniMarshal_PP_J? cb_ReturnLongValue1_ReturnLongValue1_J;

static Delegate GetReturnLongValue1Handler () => _GetReturnLongValue1Handler;

static long _GetReturnLongValue1Handler(IntPtr jnienv, IntPtr native__this)
{
  if (cb_ReturnLongValue1_ReturnLongValue1_J == null)
    cb_ReturnLongValue1_ReturnLongValue1_J = System.Runtime.CompilerServices.Unsafe.As<_JniMarshal_PP_J>(JNINativeWrapper.CreateDelegate(new _JniMarshal_PP_J(n_ReturnLongValue1)));
  return cb_ReturnLongValue1_ReturnLongValue1_J(jnienv, native__this);
}

Switching to a test case of calling a single method instead of 100, we can see that CreateDelegate cost has moved from the constructor to the first invoke of the method:

Scenario First Object Creation Second Object Creation First AddValues Invoke Second AddValues Invoke
Original 219ms 0ms 14ms 0ms
Lazy Delegate 42ms 0ms 169ms 0ms

Note: The cost is abnormally high in this example because it is the first call that uses S.R.E, causing the overhead of initializing those APIs.

grendello commented 3 days ago

I like the idea in general, however there is one thing that has to be measured in a larger example is the impact of adding another static field per each overridden method. Static state is cost that cannot be avoided by end user and I generally try to avoid it. Perhaps we could wrap the entire state in a struct instead?

grendello commented 3 days ago

It would be interesting to see results of those tests on a real device. Marshal methods are, by design, 100% lazy - there is no overhead other than Java VM doing a symbol lookup the first time the method is called. 55ms sounds a bit much, but I guess it's Java VM calling dlsym 100 times on libxamarin-app.so, I wonder if this could be somehow improved (probably not).