Import benchmark tool - Githubissues

Oceania2018 commented 6 years ago

Performance is critical for numerical computation. Import the benchmark tool BenchmarkDotNet to measure and visualize the performance.

fdncred commented 6 years ago

I've been thinking the same thing. I've used BenchmarkDotNet before but I'm not fluent in how it's used. I'll explore it when I get time.

Oceania2018 commented 6 years ago

@fdncred Appriciate, @dotChris90 Any idea on it?

dotChris90 commented 6 years ago

@Oceania2018 never used it before but you are right. we need some benchmark framework since we need to make critical decisions like your idea with the 1D array. If I did not see with my own eyes that matrix multiplication with 2 1D arrays is faster then 2 2D arrays I would not believe it.

fdncred commented 6 years ago

@Oceania2018, @dotChris90, BenchmarkDotNet is definitely one of the best to use. We'll just have to figure out the details on how to use it properly.

fdncred commented 6 years ago

Just pushed first benchmark changes with 343c1bba2422c272e2fc87d2c35492aec5131cd4. I suggest starting here for writing further benchmarks.


BenchmarkDotNet=v0.11.2, OS=Windows 10.0.17763.55 (1809/October2018Update/Redstone5)
Intel Xeon CPU E5-1650 v2 3.50GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=2.1.403
  [Host]     : .NET Core 2.1.5 (CoreCLR 4.6.26919.02, CoreFX 4.6.26919.02), 64bit RyuJIT
  Job-BSWZJZ : .NET Core 2.1.5 (CoreCLR 4.6.26919.02, CoreFX 4.6.26919.02), 64bit RyuJIT

IterationCount=5  RunStrategy=ColdStart

Method	N	Mean	Error	StdDev	Median	Min	Max
Access1	10	117.4 us	998.6 us	259.3 us	0.7000 us	0.7000 us	581.3 us
Access2	10	120.5 us	1,024.8 us	266.1 us	1.0000 us	0.8000 us	596.6 us
Access3	10	244.3 us	2,053.4 us	533.3 us	5.0000 us	4.1000 us	1,198.2 us
Access4	10	264.6 us	2,231.2 us	579.4 us	4.3000 us	4.1000 us	1,301.1 us
Access5	10	842.4 us	7,153.5 us	1,857.7 us	6.6000 us	6.1000 us	4,165.6 us
CheckPlusOperation1	10	1,542.2 us	13,039.4 us	3,386.3 us	26.9000 us	22.8000 us	7,599.8 us
CheckPlusOperation2	10	1,184.6 us	10,102.3 us	2,623.5 us	10.9000 us	10.5000 us	5,877.7 us
CheckMatrixMultiplication1	10	1,766.1 us	15,008.1 us	3,897.6 us	24.1000 us	20.4000 us	8,738.3 us
CheckMatrixMultiplication2	10	1,661.3 us	14,079.7 us	3,656.5 us	19.1000 us	18.7000 us	8,202.1 us
Access1	100	166.2 us	956.2 us	248.3 us	53.2000 us	53.1000 us	610.4 us
Access2	100	191.6 us	1,043.2 us	270.9 us	66.2000 us	64.3000 us	675.9 us
Access3	100	289.7 us	2,013.1 us	522.8 us	56.9000 us	54.1000 us	1,224.9 us
Access4	100	313.7 us	2,166.1 us	562.5 us	62.4000 us	57.1000 us	1,319.9 us
Access5	100	1,063.6 us	7,178.6 us	1,864.3 us	226.4000 us	208.3000 us	4,398.1 us
CheckPlusOperation1	100	2,257.8 us	13,340.9 us	3,464.6 us	698.9000 us	680.7000 us	8,455.3 us
CheckPlusOperation2	100	1,798.1 us	10,819.1 us	2,809.7 us	551.7000 us	530.2000 us	6,824.2 us
CheckMatrixMultiplication1	100	4,851.9 us	14,712.4 us	3,820.8 us	3,145.9000 us	3,112.1000 us	11,686.6 us
CheckMatrixMultiplication2	100	4,815.8 us	14,219.3 us	3,692.7 us	3,169.2000 us	3,125.2000 us	11,421.3 us

dotChris90 commented 6 years ago

@fdncred thanks for sharing. The link is a good "evening book" to read. ;) It is great to see that Numsharp now has testing and benchmark - feels slowly more and more professional.

fdncred commented 6 years ago

Here's a test that took more than an hour to run. It ran the tests with 100, 1000, and 3000 items in the arrays.


BenchmarkDotNet=v0.11.2, OS=Windows 10.0.17763.55 (1809/October2018Update/Redstone5)
Intel Xeon CPU E5-1650 v2 3.50GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=2.1.403
  [Host]     : .NET Core 2.1.5 (CoreCLR 4.6.26919.02, CoreFX 4.6.26919.02), 64bit RyuJIT
  Job-JWZVYD : .NET Core 2.1.5 (CoreCLR 4.6.26919.02, CoreFX 4.6.26919.02), 64bit RyuJIT

IterationCount=5  RunStrategy=ColdStart

Method	N	Mean	Error	StdDev	Median	Min	Max	Rank
Access1	100	165.4 us	943.3 us	245.0 us	55.10 us	51.70 us	603.5 us	1
Access2	100	192.3 us	1,014.5 us	263.4 us	66.30 us	64.30 us	662.6 us	1
Access3	100	296.4 us	1,989.8 us	516.7 us	71.30 us	54.20 us	1,220.6 us	1
Access4	100	317.4 us	2,180.8 us	566.3 us	62.40 us	57.30 us	1,330.4 us	1
Access5	100	1,092.0 us	7,449.5 us	1,934.6 us	224.50 us	218.80 us	4,552.7 us	2
CheckPlusOperation1	100	2,450.4 us	13,141.8 us	3,412.9 us	1,077.80 us	676.80 us	8,540.6 us	4
CheckPlusOperation2	100	1,736.3 us	10,195.6 us	2,647.8 us	543.50 us	540.30 us	6,472.7 us	3
CheckMatrixMultiplication1	100	4,787.4 us	14,548.1 us	3,778.1 us	3,102.60 us	3,070.40 us	11,545.7 us	5
CheckMatrixMultiplication2	100	4,816.5 us	14,327.0 us	3,720.7 us	3,108.20 us	3,097.70 us	11,470.4 us	5
Access1	1000	5,816.7 us	3,411.3 us	885.9 us	5,440.70 us	5,316.90 us	7,394.9 us	5
Access2	1000	7,012.3 us	3,665.4 us	951.9 us	6,638.50 us	6,493.40 us	8,711.7 us	6
Access3	1000	5,681.9 us	5,532.3 us	1,436.7 us	5,064.90 us	4,980.90 us	8,250.9 us	5
Access4	1000	6,797.0 us	4,760.9 us	1,236.4 us	6,615.30 us	5,683.30 us	8,640.0 us	6
Access5	1000	29,696.4 us	15,619.8 us	4,056.4 us	28,195.90 us	26,114.70 us	35,100.1 us	7
CheckPlusOperation1	1000	77,846.7 us	16,120.4 us	4,186.4 us	77,195.10 us	72,908.80 us	84,288.6 us	10
CheckPlusOperation2	1000	81,742.8 us	27,940.8 us	7,256.1 us	79,597.10 us	75,809.10 us	93,948.7 us	10
CheckMatrixMultiplication1	1000	21,272,555.0 us	160,448.1 us	41,667.9 us	21,269,672.10 us	21,216,600.90 us	21,326,757.3 us	15
CheckMatrixMultiplication2	1000	12,194,916.3 us	331,263.2 us	86,028.0 us	12,165,985.10 us	12,109,752.30 us	12,307,438.7 us	14
Access1	3000	51,640.1 us	23,709.5 us	6,157.3 us	48,995.50 us	48,641.20 us	62,649.6 us	8
Access2	3000	63,069.5 us	20,244.8 us	5,257.5 us	61,673.40 us	59,025.70 us	71,955.2 us	9
Access3	3000	63,213.8 us	16,639.7 us	4,321.3 us	60,809.80 us	60,278.50 us	70,328.9 us	9
Access4	3000	63,362.4 us	10,016.1 us	2,601.2 us	62,831.70 us	61,477.60 us	67,848.0 us	9
Access5	3000	227,584.3 us	5,332.8 us	1,384.9 us	226,867.30 us	226,233.20 us	229,426.1 us	11
CheckPlusOperation1	3000	635,782.2 us	40,001.7 us	10,388.3 us	632,562.90 us	628,092.30 us	653,942.5 us	12
CheckPlusOperation2	3000	696,463.0 us	19,601.5 us	5,090.4 us	695,063.70 us	692,161.50 us	705,305.3 us	13
CheckMatrixMultiplication1	3000	792,708,661.9 us	14,478,874.2 us	3,760,117.6 us	795,124,884.80 us	786,975,421.80 us	795,414,398.9 us	17
CheckMatrixMultiplication2	3000	425,625,221.2 us	5,820,745.9 us	1,511,629.2 us	425,790,985.10 us	423,823,215.60 us	427,567,933.3 us	16

dotChris90 commented 6 years ago

@fdncred just watch the benchmark test. Extremely nice. Looks easy to use, clean and still very mature. Nice suggestion.

fdncred commented 6 years ago

@dotChris90, Thanks. We've just scratched the surface. There are sooooo many things it can do. We're not using it exactly right but it's a start. I forced the coldstart instead of warm ups because it just took so long.

Oceania2018 commented 6 years ago

@dotChris90 I just found this code extremly slow down NDArray.ARange:

public NDArray<T> ARange(int stop, int start = 0, int step = 1)
        {
            dynamic npDyn = this;

            dynamic npResult = NumSharp.Extensions.NDArrayExtensions.ARange(npDyn, stop,start,step);

            return npResult;
        }

20x slower. We need to fix it. I'll push benchmark code np.arrange.cs.

Acutall, ndarray doesn't need arange extension, only NumPy.arange make sense. I'll delete NDArray<T>.ARange.

dotChris90 commented 6 years ago

@Oceania2018 I was expecting this - but not that huge.

please rise an issue "Remove Extension Linking". This methods tries to make extension methods accessible from F#, Php, Powershell and even Python.NET - see #4 .

But I will remove them. And open maybe 1 Side Project in src folder "NumSharp.NonCSharp" for making it better accessible or "NumSharp.Powershell", "NumSharp.FSharp", and so on. The original Core C# must have the best performance. The other "language accessibility" is just a nice to have.

But! If you do not mind - I would like to have a look closely to the extension methods again and check which methods are better as extension methods and which can used as normal. For a better accessibility from other languages I would like to try to use as few extension methods as possible.

Oceania2018 commented 6 years ago

@dotChris90 I've pushed the code. The thing is ndarray doesn't need arange extension, there is no ndarray.arange in python numpy neither. I think you have to remove it and all the other extensions which is not consist with python numpy. I added the NumPy class to adapt the high level API which is mocking numpy. We should use like

var np = new NumPy<int>();
var n = np.arange(3).reshape(3, 1);

User doesn't need to interact with NDArray<T>

@fdncred BenchmarkDoNet really helps us to find performance issue.

dotChris90 commented 6 years ago

@Oceania2018 ah ok got it. I will remove the extension linking class tomorrow - or you do. ;)

Oceania2018 commented 6 years ago

@dotChris90 It looks more sense now, ahh. After I remove the dynamic type. like:

dynamic npDyn = this;
return NumSharp.Extensions.NDArrayExtensions.ArgMax(npDyn);

I think we should not use dynamic type anymore. Performance killer.

dotChris90 commented 6 years ago

@Oceania2018 I was expecting this - same performance killer like DynamicObject but honestly I had no idea what it is such critical. 10 or 20 times ... wow --> thats why compilers are better than interpreter lol

I will rethink about a strategy for operators. Operators in C# can not! be! extensions - they must be static nonextensions method. example The input and output for a NDArray + operation is clear. NDArray always. but inside the code I use dynamic. So need to rewrite them again.

Oceania2018 commented 6 years ago

@dotChris90 before you rewrite it, give them a benchmark first. remove dynamic that will be fine.

Oceania2018 commented 6 years ago

@dotChris90 @fdncred Continue to speed up 3x more.

Can we improve further?

NEW GOAL: `150 us`.

fdncred commented 6 years ago

Nice! Great job.

fdncred commented 6 years ago

Is there any way to use Span because it's super fast?

fdncred commented 6 years ago

Here's another idea. I have not tested this yet but any Enumerable has .AsParallel() method. I wonder if some of the methods would benefit from that?

So it'd be something like this:

public static NDArray<double> ARange(this NDArray<double> np,int stop, int start = 0, int step = 1)
        {
            int index = 0;

            np.Data = Enumerable.Range(start,stop - start)
                                .Where(x => index++ % step == 0)
                                .Select(x => (double)x)
                                .ToArray();
            np.Shape = new Shape(new int[] { stop });
            return np;
        }

Changed to this:

public static NDArray<double> ARange(this NDArray<double> np,int stop, int start = 0, int step = 1)
        {
            int index = 0;

            np.Data = Enumerable.Range(start,stop - start)
                                .Where(x => index++ % step == 0)
                                .Select(x => (double)x)
                                .AsParallel()
                                .ToArray();
            np.Shape = new Shape(new int[] { stop });
            return np;
        }

Oceania2018 commented 6 years ago

Try to avoid using Linq. I used the raw for.

        public static NDArray<int> ARange(this NDArray<int> np,int stop, int start = 0, int step = 1)
        {
            var list = new int[(int)Math.Ceiling((stop - start + 0.0) / step)];
            int index = 0;

            for (int i = start; i < stop; i += step)
                list[index++] = i;

            np.Data = list;
            np.Shape = new Shape(list.Length);

            return np;
        }

@fdncred Nothing to do with Span<T>.

fdncred commented 6 years ago

Yes, I realized that is what you did. I'm suggesting that Linq performance could be improved with .AsParallel(). And of course, just about any for loop could be improved with Parallel.For.

dotChris90 commented 6 years ago

So I suggest we should consider the following benchmark tests. I write them down no matter if already exists.

element wise operations */+- : for loop or linq identify which is best
matrix multiplication : jagged arrays, multidimensional, 1D storage, NDArray, and with multidimensional as base line / reference
casting : how much cost the pattern matching switch stuff a. K. a. Check generic type if is double, int, complex etc, with element wise operations and matrix multiplication. Base line is non casting.
using dynamic keyword, is it always a performance killer or just under conditions

This are at least the 4 performance checks I would like to see. Please feel free to add more and later can rise issues. 😉

dotChris90 commented 6 years ago

@Oceania2018 @fdncred by the way you 2 ;) - I did just some experiments with casts. I was really surprised that the generic T[] Data property (so a generic array) can not be directly cast to double[] or sth else. An alternative strategy is the boxing - so array.select(x => (double)x) and so on. But I am not a fan of boxing. The most interesting and maybe promising solution is :

double[] A = np.Data as double[];

With this we can avoid using dynamic keyword and cast the whole array without boxing (hope at least - I do not know what "as" keyword is doing in background).

We could also maybe try to bring this into some extension methods since the only alternative is to write every extension method for "NDArray< double >,NDArray< Complex >, ...." so for every possible type (except we have no numerical operation and so maybe can use NDArray< T >).

fdncred commented 6 years ago

Boxing kills performance. I've seen articles that talk about performance difference between "As casting" compared to "direct casting" or "is casting".

https://www.danielcrabtree.com/blog/164/c-sharp-7-micro-benchmarking-the-three-ways-to-cast-safely https://stackoverflow.com/questions/496096/casting-vs-using-the-as-keyword-in-the-clr https://buildstarted.com/2014/06/27/casting-in-csharp/

dotChris90 commented 6 years ago

@fdncred thanks for sharing. The articles explains a lot. I was often wondering about the different casting options.

Most import is that pattern matching is really the strategy to go with for us. :)

fdncred commented 6 years ago

it's probably safe to close this.

SciSharp / NumSharp

Import benchmark tool #63

NEW GOAL: `150 us`.

SciSharp / NumSharp

Import benchmark tool #63

NEW GOAL: 150 us.

NEW GOAL: `150 us`.