MadsKirkFoged / EngineeringUnits

Working with units made easy with automatic unit-check and converting between units
MIT License
42 stars 10 forks source link

Optimize for performance #11

Open MadsKirkFoged opened 2 years ago

MadsKirkFoged commented 2 years ago

I would like to start a discussion about how we can optimize for better performance. Feel free to chip in!

  1. Analyse where the CPU is spending much of the time? --> Can we write better code that fixes the heavy CPU using functions?

  2. Could we implement new strategies? --> fx. If two units are Set as SI we could ship Unitchecks and other parts of the system

ikijano commented 2 years ago

I haven't investigate much yet but I think most of CPU time is actually used by memory allocations and this causes pressure for garbage collector. Specially when doing calculations where simple summing of two temperatures takes over 5,5 kB memory! Even constructing basic Temperature object from SI takes over half kilobytes of of memory which seems quite lot.

(edit, was wrong benchmark implementation)


BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1526 (21H1/May2021Update)
Intel Core i7-4600U CPU 2.10GHz (Haswell), 1 CPU, 4 logical and 2 physical cores
.NET SDK=6.0.100-rc.2.21505.57
  [Host] : .NET Core 3.1.20 (CoreCLR 4.700.21.47003, CoreFX 4.700.21.47101), X64 RyuJIT

Job=InProcess  Toolchain=InProcessEmitToolchain  
Method Mean Error StdDev Gen 0 Allocated
BaseUnit_new 5.475 ns 0.1129 ns 0.0881 ns 0.0229 48 B
Temperature_FromSI 2,310.795 ns 43.8149 ns 40.9845 ns 0.2708 584 B
Temperature_SI 1,871.114 ns 32.6237 ns 25.4705 ns 0.0763 160 B
Temperature_Plus_Temperature 4,886.971 ns 18.9505 ns 15.8245 ns 2.7390 5,768 B
[InProcess()]
[MemoryDiagnoser]
public class Benchy
{
    [Benchmark]
    public BaseUnit BaseUnit_new()
    {
        return new BaseUnit();
    }

    readonly Temperature _T1 = Temperature.FromSI(1);
    readonly Temperature _T2 = Temperature.FromSI(2);

    [Benchmark]
    public Temperature Temperature_FromSI()
    {
        return Temperature.FromSI(293.15);
    }

    [Benchmark]
    public double Temperature_SI()
    {
        return _T1.SI;
    }

    [Benchmark]
    public UnknownUnit Temperature_Plus_Temperature()
    {
        return _T1 + _T2;
    }
}
ikijano commented 2 years ago

I already have some progress, its not much but a few percent improvement. Memory pressure has also degreased quite a bit and conversion back to SI units doesn't allocate any more memory (in case of Temperature unit).

I replaced in UnitSystem.cs Tuple reference object with Tuple value object in UnitsCount() method and in equality operator UnitsCount() method call is now evaluated only once per unit-system. UnitsCount list length is checked seperetly and method exits early if count doesn't match. I'm not sure does that affect at all.

-        public List<Tuple<string,int>> UnitsCount()
+        public List<(string Key, int Value)> UnitsCount()
         {
             //This returns <typeOfUnit,Unit Count of the specifig type>

             //var test = ListOfUnits
             //        .Where(x => x.TypeOfUnit != "CombinedUnit")
@@ -58,55 +58,79 @@ namespace EngineeringUnits

             return ListOfUnits
                     .Where(x => x.TypeOfUnit != "CombinedUnit")
                     .GroupBy(x => x.TypeOfUnit)
-                    .Select(x => new Tuple<string, int>(x.Key, x.Sum(x => x.Count)))
-                    .Where(x=> x.Item2 != 0)
+                    .Select(x => (x.Key, x.Sum(x => x.Count)))
+                    .Where(x => x.Item2 != 0)
                     .ToList();
         }

         public static bool operator ==(UnitSystem a, UnitSystem b)
         {

-            return a.UnitsCount().All(b.UnitsCount().Contains) && 
-                   a.UnitsCount().Count == b.UnitsCount().Count;         
+            var aUnitsCount = a.UnitsCount();
+            var bUnitsCount = b.UnitsCount();
+
+            if (aUnitsCount.Count != bUnitsCount.Count)
+                return false;
+
+            return aUnitsCount.All(bUnitsCount.Contains);
         }

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1526 (21H1/May2021Update)
Intel Core i7-4600U CPU 2.10GHz (Haswell), 1 CPU, 4 logical and 2 physical cores
.NET SDK=6.0.100-rc.2.21505.57
  [Host]     : .NET Core 3.1.20 (CoreCLR 4.700.21.47003, CoreFX 4.700.21.47101), X64 RyuJIT
  DefaultJob : .NET Core 3.1.20 (CoreCLR 4.700.21.47003, CoreFX 4.700.21.47101), X64 RyuJIT
Method Mean Error StdDev Gen 0 Allocated
BaseUnit_new 4.561 ns 0.1848 ns 0.2590 ns 0.0229 48 B
Temperature_FromSI 2,100.589 ns 41.9153 ns 43.0440 ns 0.2022 424 B
Temperature_SI 1,586.193 ns 31.1969 ns 34.6753 ns - -
Temperature_Plus_Temperature 3,040.028 ns 59.3769 ns 81.2758 ns 1.4687 3,080 B
Temperature_From_Temperature_Plus_Temperature 3,289.322 ns 62.8528 ns 58.7926 ns 1.5793 3,304 B
MadsKirkFoged commented 2 years ago

I have optimized some code, included with your code. Can you run it again? Maybe use any other unit then temperature, because temperature is a special case and takes more to calculate.

The list I have so fare of future optimizations:

Edit: Running the same tests after the upgrades I just did (On a faster PC..)

Method Mean Error StdDev Gen 0 Allocated
BaseUnit_new 3.679 ns 0.1400 ns 0.2983 ns 0.0111 48 B
Temperature_FromSI 751.549 ns 11.7679 ns 9.8267 ns 0.1163 504 B
Temperature_SI 494.250 ns 8.2973 ns 7.3553 ns 0.0181 80 B
Temperature_Plus_Temperature 224.435 ns 3.7084 ns 3.4689 ns 0.0167 72 B

I have never used this benchmark system but it looks awesome! I will try to make some more benchmarks

ikijano commented 2 years ago

I can give a try, but I need to use same environment than earlier.

Have you some kind tests in mind what we should try to test. Micro benchmarking is slow process so it would be nice to if we have some plan what operations we should focus. You know best anatomy of library and internals how they works :)

I had worries about how fast or memory hog Fraction type is but it seems that it's not a bottle neck. It's implemented as value type, no unnecessary memory allocations and calculations seems to be quite fast.

ikijano commented 2 years ago

Just fetched your latest version and speed and memory allocations looks much better than yesterday. Specially memory allocations, which doesn't depend CPU, has been reduced dramatically, nice!

Method Mean Error StdDev Gen 0 Allocated
BaseUnit_new 4.587 ns 0.0869 ns 0.0726 ns 0.0229 48 B
Temperature_FromSI 1,709.116 ns 14.0924 ns 13.1821 ns 0.2384 504 B
Temperature_SI 1,326.764 ns 12.1094 ns 10.7346 ns 0.0381 80 B
Temperature_Plus_Temperature 495.060 ns 7.2759 ns 6.8059 ns 0.0343 72 B
Temperature_From_Temperature_Plus_Temperature 516.884 ns 6.2110 ns 5.8098 ns 0.0792 168 B

I forget show last test what I used which constructs Temperaturefrom UnknownUnit after addition and causes some extra memory allocations.

    [Benchmark]
    public Temperature Temperature_From_Temperature_Plus_Temperature()
    {
        return _T1 + _T2;
    }

This is first time when I used any type of benchmarking and this was good way to learn how to use BenchmarkDotNet. It's very nice tool and I should start to use it more.

MadsKirkFoged commented 2 years ago

I just did speed test:


            Power P2 = Power.FromSI(10);
            Length L2 = Length.FromSI(2);
            Temperature T2 = Temperature.FromSI(4);

            UnknownUnit test = 0;
            var watch = System.Diagnostics.Stopwatch.StartNew();

            for (int i = 0; i < 1000000; i++)
            {
                test = P2 / (L2 * T2);
            }
            watch.Stop();
            var elapsedMs = watch.ElapsedMilliseconds;

1mio calculations in 0.240 sec (on my computer).

Of cause if you do your calculation in double you can get much faster then this