Quansight-Labs / numpy.net

A port of NumPy to .Net
BSD 3-Clause "New" or "Revised" License
128 stars 14 forks source link

Numpy dotnet Performance Issue #56

Open Sundarrajan06295 opened 11 months ago

Sundarrajan06295 commented 11 months ago

I tried to multiple 2 large array with numpydotnet library Code : np.multiply(Data1, Data2) ->300ms I clearly see there is a performance degradation when compared to numpy library ->150ms

KevinBaselinesw commented 11 months ago

Are the data types the same? For example are they both doubles? Same type data will perform much better.

Could you provide a more complete example. How big are the arrays? What is the data type? What is the shape of the arrays that you are testing?

Sundarrajan06295 commented 11 months ago

For same data type the difference is minimum. When I do quantile it is taking more than 150 to 200ms Code : np.quantile(array1,array2) When I do variance of image size(3815 * 2800) after transpose it is taking 250ms code : np.var(data.Transpose(),axis:0)

KevinBaselinesw commented 11 months ago

One big difference between C# and C (python numpy is a C library) is that C allows easier/faster casting between data types. In C# if you try to cast Int32 to UInt32 I think it will throw an exception but C will allow it. This forces NumpyDotNet to follow a code path that ultimately uses the "dynamic" data type to allow different data types to be used together. It works great, but it is a quite a bit slower. That is why carefully using the same data types will allow the library to run much faster. I can follow templated code paths that don't use the dynamic data type. This also applies to constant values. Something like doubleArray + 1 should be written doubleArray + 1.0 to get maximum performance.

I am working on another issue that can cause slower performance. I use try/catch around most of the calculations. This allows me to catch calculation errors that throw exceptions (i.e. divide by zero, overflows, etc...) and set a default value instead which is what python/numpy does. However, C# try/catch does add a significant CPU overhead. If that is in the middle of 1 million calculations, that can add up to a lot of time. I am working on adding a feature to disable/reroute code to not use try/catch. If you are confident your application will not cause an exception (99% probably don't) then it can speed up the calculations by about 20%.

Would you be willing to demo this feature in your code?

KevinBaselinesw commented 11 months ago

I will look into the np.quantile and np.var performance issues too.

Sundarrajan06295 commented 11 months ago

We have following interesting observations:

  1. Variance of axis 1 is always throwing error ? Data size : (3815 * 2800) Code : var variance = np.var(data,axis:1) Exception : Unhandled exception. System.Exception: shape mismatch: objects cannot be broadcast to a single shape at NumpyLib.numpyinternal.GenerateBroadcastedDims(NpyArray leftArray, NpyArray rightArray) at NumpyLib.numpyinternal.NpyArray_NumericOpArraySelection(NpyArray srcArray, NpyArray operandArray, UFuncOperation operationType) at NumpyLib.numpyinternal.NpyArray_PerformNumericOperation(UFuncOperation operationType, NpyArray x1Array, NpyArray x2Array, NpyArray outArray, NpyArray whereFilter) at NumpyLib.numpyAPI.NpyArray_PerformNumericOperation(UFuncOperation operationType, NpyArray x1Array, NpyArray x2Array, NpyArray outArray, NpyArray whereFilter) at NumpyDotNet.NpyCoreApi.PerformNumericOp(ndarray a, UFuncOperation ops, ndarray b, Boolean UseSrcAsDest) at NumpyDotNet.ndarray.op_Subtraction(ndarray a, ndarray b) at NumpyDotNet.np.var(Object a, Nullable`1 axis, dtype dtype, Int32 ddof, Boolean keep_dims)

  2. To circumvent, we are transposing an array and doing variance calculation. However, we observed with array of the size (3815 * 2800) , np.var(data.Transpose(), axis: 0) takes around 250-300 milliseconds compared to np.var(data, axis:0) which takes ~60 ms.

Sundarrajan06295 commented 11 months ago

When I try to debug numpydotnet file , I found the error is in this line "(object)(ndarray1.astype(np.Float32) - ndarray2)"

KevinBaselinesw commented 11 months ago

I have a bug fix coming for this today.

Sundarrajan06295 commented 11 months ago

okay , how can i use it

KevinBaselinesw commented 11 months ago

If you give me an email address at kmckenna at baselinesw.com, I can send you a new DLL that should work for you.

KevinBaselinesw commented 11 months ago

I have researched why np.quantile takes much longer than python version does. The root cause is that np.quantile ultimately calls np.partition code to do the heavy work. This code is much slower in C# than in the python C code. The reason is the python code uses a lot of complex C macros to do the work which effectively inlines all of the processing. C# does not support macros so I had to turn the macros into functions. These functions are called very frequently which greatly adds to the processing overhead. I can't think of any way to speed this up while still making the code readable and debuggable.

KevinBaselinesw commented 10 months ago

I have research why np.var takes much longer than python does. The root cause is that np.var is really comprised of 7 (at least) math operations on the array. If each one takes a little bit longer than the python/C version does, it adds up to a significant difference. I have made a few small tweeks to the code to make it a little faster.