andywilsonuk / StringTokenFormatter

Provides string extension methods for replacing tokens within strings (using the format '{name}') with their specified lookup value.
Apache License 2.0
37 stars 1 forks source link

Pull Request Head's Up! #14

Closed TonyValenti closed 5 years ago

TonyValenti commented 5 years ago

Hi @andywilsonuk - I wanted to let you know that in the next few days I'll have a somewhat advanced but highly performant pull request coming through.

Here's a summary of the reason why and what I'm doing: In our usage scenario, we're using StringTokenFormatter in an extremely tight loop with millions of items that we're using as variables. We've noticed a handful of things happening:

  1. When working with simple properties (ie. public string foo {get; set;};), we actually lose memory and CPI performance by caching the results. This happens because (a) the property access is proxied through the lazy and (b) the lazy is a capturing delegate which basically creates a ton of unnecessary memory garbage. This is especially true since in one of our scenarios, we only use 3 out of 12 properties on the object.

  2. Lots of time is being spent fetching the property value through the TypeDescriptor.

The upcoming pull request will do the following:

  1. Allow you to specify whether ObjectPropertiesTokenValueContainer<T> wraps property accesses inside a lazy.

  2. Allows you to specify whether ''''ObjectPropertiesTokenValueContainer'''' retrieves properties using TypeDescriptor or Reflection. When reflection is specified, the list of available properties is created using reflection and that list is then used to create a high performance compiled expression for each member. Property access is not actually done through reflection and I'd love a better name besides reflection. Do you have a suggestion?

Here are some raw performance numbers on property access: Iterations: 100,000,000

var V = propertyDescriptor.GetValue(foo); //Type Descriptor - 20.40 seconds
var V = propertyInfo.GetValue(foo); //Reflection - 11.34 seconds
var F = Expression.Compile(); var V =F(foo); //Compiled expression - 0.87 seconds
var V = foo.Property; //Direct Access - 0.31 seconds

While I don't think we'll ever be able to get performance as fast as a native property invocation, we're able to move it from 20.40 seconds down to 0.87 seconds. That's a 24 times faster!

Performance code below.

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Linq;
using System.Linq.Expressions;
using System.Text;
using System.Threading.Tasks;

namespace ReflectionPer {
    class Program {
        static void Main(string[] args) {
            var Instance = new Foo();
            var Times = 100_000_000;

            var SW1 = System.Diagnostics.Stopwatch.StartNew();
            for (int i = 0; i < Times; i++) {
                var V = Instance.Name;
            }
            SW1.Stop();

            var Prop1 = typeof(Foo).GetProperty("Name");
            Prop1.GetValue(Instance);
            var SW2 = System.Diagnostics.Stopwatch.StartNew();

            for (int i = 0; i < Times; i++) {
                var V = Prop1.GetValue(Instance);
            }

            SW2.Stop();

            var Prop2 = TypeDescriptor.GetProperties(typeof(Foo)).OfType<PropertyDescriptor>().Where(x => x.Name == "Name").FirstOrDefault();
            Prop2.GetValue(Instance);

            var SW3 = System.Diagnostics.Stopwatch.StartNew();

            for (int i = 0; i < Times; i++) {
                var V = Prop2.GetValue(Instance);
            }

            SW3.Stop();

            var Prop3 = Prop1.GetGetMethod();
            var obj = Expression.Parameter(typeof(Foo), "instance");
            var property = Expression.Call(obj, Prop3);
            var Accessor = Expression.Lambda<Func<Foo, Object>>(property, obj);
            var Action = Accessor.Compile();

            var VVV = Action(Instance);
            var SW4 = System.Diagnostics.Stopwatch.StartNew();
            for (int i = 0; i < Times; i++) {
                var V = Action(Instance);
            }
            SW4.Stop();

            var Prop5 = Prop1;
            var obj5 = Expression.Parameter(typeof(Foo), "instance");
            var property5 = Expression.Property(obj5, Prop5);
            var Accessor4 = Expression.Lambda<Func<Foo, Object>>(property5, obj5);
            var Action4 = Accessor.Compile();

            var WarmUp4 = Action(Instance);
            var SW5 = System.Diagnostics.Stopwatch.StartNew();
            for (int i = 0; i < Times; i++) {
                var V = Action4(Instance);
            }
            SW5.Stop();

            var Members = typeof(Foo).GetProperties(System.Reflection.BindingFlags.Public | System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.GetProperty);
        }
    }

    public class Foo {
        public string Name { get; set; } = "asdf";
        private string Other { get; set; }

        public string Something { set { } }

        public static String StaticProperty { get; set; }

    }

}
andywilsonuk commented 5 years ago

Hi TonyValenti,

I was expecting a PR from you but looking at what you've outlined I was wondering why someone would ever want the Type Descriptor if the Reflection is always faster? I'd be reluctant to remove the Lazy from the ObjectPropertiesTokenValueContainer<T> implementation because that would change the classes behaviour (i.e. from read-once to read each time) although it's not impossible that there could be a cached and non-cached implementation. I suspect, also, that the compiled expression version only provides performance benefits after a certain number of iterations and therefore that would definitely be worth spinning out into another container class.

So my view is: existing behaviour should be retained (unless it really doesn't make sense any more) and specific optimised versions should be spun off into new containers.

Really interesting work by the way, good job.

TonyValenti commented 5 years ago

Hi @andywilsonuk - Thanks for the message! I've been thinking over some of the code and I'd like your input:

  1. The only reason I included the type descriptor was because that is what the original code did. Technically speaking, using the type descriptor allows some crazy stuff similar to WPF attached properties, however, I'd wager money that no users are using that feature. --Are you comfortable with me removing the type descriptor code? I am. I just want your approval first.

  2. I'll run the numbers on what the compiled expression 1-time cost is and get back to you.

  3. Right now there is a FormatToken(object) extension method. I'd like to add a FormatToken(T) extension method that calls into the new code. This will preserve binary compatibility and will give huge benefits to new consumers. --Any objections?

  4. Also, I have a few enhancements to NonLockingLazy. The current implementation uses a capturing lambda which essentially exists for the life of the container. This can be problematic because it can cause the app to hold onto a value that should be GCed and will never be used again. I null it out after invoking the value.

TonyValenti commented 5 years ago

@andywilsonuk ?

andywilsonuk commented 5 years ago

Hi Tony,

Yes happy not to use TypeDescriptor it was just a means to an end. The new extension to add the generic version of FormatToken makes sense, could you add a test for it as well? I'm also happy for you to improve NonLockingLazy as you see fit, the behaviour of the class is really well defined.