Open MadsKirkFoged opened 2 years ago
I haven't investigate much yet but I think most of CPU time is actually used by memory allocations and this causes pressure for garbage collector. Specially when doing calculations where simple summing of two temperatures takes over 5,5 kB memory! Even constructing basic Temperature object from SI takes over half kilobytes of of memory which seems quite lot.
(edit, was wrong benchmark implementation)
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1526 (21H1/May2021Update)
Intel Core i7-4600U CPU 2.10GHz (Haswell), 1 CPU, 4 logical and 2 physical cores
.NET SDK=6.0.100-rc.2.21505.57
[Host] : .NET Core 3.1.20 (CoreCLR 4.700.21.47003, CoreFX 4.700.21.47101), X64 RyuJIT
Job=InProcess Toolchain=InProcessEmitToolchain
Method | Mean | Error | StdDev | Gen 0 | Allocated |
---|---|---|---|---|---|
BaseUnit_new | 5.475 ns | 0.1129 ns | 0.0881 ns | 0.0229 | 48 B |
Temperature_FromSI | 2,310.795 ns | 43.8149 ns | 40.9845 ns | 0.2708 | 584 B |
Temperature_SI | 1,871.114 ns | 32.6237 ns | 25.4705 ns | 0.0763 | 160 B |
Temperature_Plus_Temperature | 4,886.971 ns | 18.9505 ns | 15.8245 ns | 2.7390 | 5,768 B |
[InProcess()]
[MemoryDiagnoser]
public class Benchy
{
[Benchmark]
public BaseUnit BaseUnit_new()
{
return new BaseUnit();
}
readonly Temperature _T1 = Temperature.FromSI(1);
readonly Temperature _T2 = Temperature.FromSI(2);
[Benchmark]
public Temperature Temperature_FromSI()
{
return Temperature.FromSI(293.15);
}
[Benchmark]
public double Temperature_SI()
{
return _T1.SI;
}
[Benchmark]
public UnknownUnit Temperature_Plus_Temperature()
{
return _T1 + _T2;
}
}
I already have some progress, its not much but a few percent improvement. Memory pressure has also degreased quite a bit and conversion back to SI units doesn't allocate any more memory (in case of Temperature unit).
I replaced in UnitSystem.cs Tuple reference object with Tuple value object in UnitsCount()
method and in equality operator UnitsCount()
method call is now evaluated only once per unit-system. UnitsCount list length is checked seperetly and method exits early if count doesn't match. I'm not sure does that affect at all.
- public List<Tuple<string,int>> UnitsCount()
+ public List<(string Key, int Value)> UnitsCount()
{
//This returns <typeOfUnit,Unit Count of the specifig type>
//var test = ListOfUnits
// .Where(x => x.TypeOfUnit != "CombinedUnit")
@@ -58,55 +58,79 @@ namespace EngineeringUnits
return ListOfUnits
.Where(x => x.TypeOfUnit != "CombinedUnit")
.GroupBy(x => x.TypeOfUnit)
- .Select(x => new Tuple<string, int>(x.Key, x.Sum(x => x.Count)))
- .Where(x=> x.Item2 != 0)
+ .Select(x => (x.Key, x.Sum(x => x.Count)))
+ .Where(x => x.Item2 != 0)
.ToList();
}
public static bool operator ==(UnitSystem a, UnitSystem b)
{
- return a.UnitsCount().All(b.UnitsCount().Contains) &&
- a.UnitsCount().Count == b.UnitsCount().Count;
+ var aUnitsCount = a.UnitsCount();
+ var bUnitsCount = b.UnitsCount();
+
+ if (aUnitsCount.Count != bUnitsCount.Count)
+ return false;
+
+ return aUnitsCount.All(bUnitsCount.Contains);
}
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1526 (21H1/May2021Update)
Intel Core i7-4600U CPU 2.10GHz (Haswell), 1 CPU, 4 logical and 2 physical cores
.NET SDK=6.0.100-rc.2.21505.57
[Host] : .NET Core 3.1.20 (CoreCLR 4.700.21.47003, CoreFX 4.700.21.47101), X64 RyuJIT
DefaultJob : .NET Core 3.1.20 (CoreCLR 4.700.21.47003, CoreFX 4.700.21.47101), X64 RyuJIT
Method | Mean | Error | StdDev | Gen 0 | Allocated |
---|---|---|---|---|---|
BaseUnit_new | 4.561 ns | 0.1848 ns | 0.2590 ns | 0.0229 | 48 B |
Temperature_FromSI | 2,100.589 ns | 41.9153 ns | 43.0440 ns | 0.2022 | 424 B |
Temperature_SI | 1,586.193 ns | 31.1969 ns | 34.6753 ns | - | - |
Temperature_Plus_Temperature | 3,040.028 ns | 59.3769 ns | 81.2758 ns | 1.4687 | 3,080 B |
Temperature_From_Temperature_Plus_Temperature | 3,289.322 ns | 62.8528 ns | 58.7926 ns | 1.5793 | 3,304 B |
I have optimized some code, included with your code. Can you run it again? Maybe use any other unit then temperature, because temperature is a special case and takes more to calculate.
The list I have so fare of future optimizations:
Edit: Running the same tests after the upgrades I just did (On a faster PC..)
Method | Mean | Error | StdDev | Gen 0 | Allocated |
---|---|---|---|---|---|
BaseUnit_new | 3.679 ns | 0.1400 ns | 0.2983 ns | 0.0111 | 48 B |
Temperature_FromSI | 751.549 ns | 11.7679 ns | 9.8267 ns | 0.1163 | 504 B |
Temperature_SI | 494.250 ns | 8.2973 ns | 7.3553 ns | 0.0181 | 80 B |
Temperature_Plus_Temperature | 224.435 ns | 3.7084 ns | 3.4689 ns | 0.0167 | 72 B |
I have never used this benchmark system but it looks awesome! I will try to make some more benchmarks
I can give a try, but I need to use same environment than earlier.
Have you some kind tests in mind what we should try to test. Micro benchmarking is slow process so it would be nice to if we have some plan what operations we should focus. You know best anatomy of library and internals how they works :)
I had worries about how fast or memory hog Fraction type is but it seems that it's not a bottle neck. It's implemented as value type, no unnecessary memory allocations and calculations seems to be quite fast.
Just fetched your latest version and speed and memory allocations looks much better than yesterday. Specially memory allocations, which doesn't depend CPU, has been reduced dramatically, nice!
Method | Mean | Error | StdDev | Gen 0 | Allocated |
---|---|---|---|---|---|
BaseUnit_new | 4.587 ns | 0.0869 ns | 0.0726 ns | 0.0229 | 48 B |
Temperature_FromSI | 1,709.116 ns | 14.0924 ns | 13.1821 ns | 0.2384 | 504 B |
Temperature_SI | 1,326.764 ns | 12.1094 ns | 10.7346 ns | 0.0381 | 80 B |
Temperature_Plus_Temperature | 495.060 ns | 7.2759 ns | 6.8059 ns | 0.0343 | 72 B |
Temperature_From_Temperature_Plus_Temperature | 516.884 ns | 6.2110 ns | 5.8098 ns | 0.0792 | 168 B |
I forget show last test what I used which constructs Temperature
from UnknownUnit
after addition and causes some extra memory allocations.
[Benchmark]
public Temperature Temperature_From_Temperature_Plus_Temperature()
{
return _T1 + _T2;
}
This is first time when I used any type of benchmarking and this was good way to learn how to use BenchmarkDotNet. It's very nice tool and I should start to use it more.
I just did speed test:
Power P2 = Power.FromSI(10);
Length L2 = Length.FromSI(2);
Temperature T2 = Temperature.FromSI(4);
UnknownUnit test = 0;
var watch = System.Diagnostics.Stopwatch.StartNew();
for (int i = 0; i < 1000000; i++)
{
test = P2 / (L2 * T2);
}
watch.Stop();
var elapsedMs = watch.ElapsedMilliseconds;
1mio calculations in 0.240 sec (on my computer).
Of cause if you do your calculation in double you can get much faster then this
I would like to start a discussion about how we can optimize for better performance. Feel free to chip in!
Analyse where the CPU is spending much of the time? --> Can we write better code that fixes the heavy CPU using functions?
Could we implement new strategies? --> fx. If two units are Set as SI we could ship Unitchecks and other parts of the system