SciSharp / NumSharp

High Performance Computation for N-D Tensors in .NET, similar API to NumPy.
https://github.com/SciSharp
Apache License 2.0
1.38k stars 192 forks source link

np.save throws Exception on large datasets #409

Closed pavlexander closed 3 years ago

pavlexander commented 4 years ago

I am trying to save the 3D array of doubles of size {double[719118, 61, 7]} image

Since it's a 64 bit application and I have plenty on RAM - I can operate with arrays of such size OK. But the numpy throws following error when saving to file:

System.OverflowException: 'Arithmetic operation resulted in an overflow.'

image

System.OverflowException HResult=0x80131516 Message=Arithmetic operation resulted in an overflow. Source=NumSharp.Core StackTrace: at NumSharp.np.writeValueMatrix(BinaryWriter reader, Array matrix, Int32 bytes, Int32[] shape) at NumSharp.np.Save(Array array, Stream stream) at NumSharp.np.Save(Array array, String path) at NumSharp.np.save(String filepath, Array arr) at Processor.DataPersister.SaveArrayAsNumpyArray(String filePath, String fileName, Array numpyArray) in C:\Users\Lucky\source\repos\ChromedriverPlot\Processor\DataPersister.cs:line 55 at Processor.DataPersister.PreproccessExtractedSymbols(DateTime dt, String symbolName, Int32 stepMinutePeriod) in C:\Users\Lucky\source\repos\ChromedriverPlot\Processor\DataPersister.cs:line 523 at Processor.DataPersister.<>cDisplayClass9_0.b0(String symbolName) in C:\Users\Lucky\source\repos\ChromedriverPlot\Processor\DataPersister.cs:line 454 at System.Threading.Tasks.Parallel.<>cDisplayClass33_0`2.b0(Int32 i) at System.Threading.Tasks.Parallel.<>c__DisplayClass19_0`1.b__1(RangeWorker& currentWorker, Int32 timeout, Boolean& replicationDelegateYieldedBeforeCompletion)

Is there a way to bypass/fix this issue?

It seems like numpy has no problems converting standard 3D array into numpy array.. but saving it is impossible.. so both save method overloads throw an exception.

NumSharp 0.20.5 .Net Core 3.1 x64 app win 10 x64 VS 2019

pavlexander commented 4 years ago

the issue is related to the fact that byte array cannot hold more than int byteMaxSize = 2_147_483_591; number of elements in it.. (tested on framework 4.7.2), so the following line fails:

            var buffer = new byte[bytes * total]; // here
            Buffer.BlockCopy(matrix, 0, buffer, 0, buffer.Length);
            reader.Write(buffer, 0, buffer.Length);

if you want to fix it - you need to write a logic that would write the data in chunks, in cases when the bytes * total value exceeds the maximum allowed number of elements in byte array..

There might be other places that require a fix, I haven't checked..

some info here: https://stackoverflow.com/questions/1391672/what-is-the-maximum-size-that-an-array-can-hold

code to reproduce:

        static void Main(string[] args)
        {
            // for .Net framework I set App.config: gcAllowVeryLargeObjects = True
            // for .Net Core I set env: COMPlus_gcAllowVeryLargeObjects = 1

            Console.WriteLine("Hello World!");

            var testArray = new double[719118, 61, 7];
            long doubleArrayTotalSize = testArray.GetLength(0) * testArray.GetLength(1) * testArray.GetLength(2);
            var byteArrayTotalSize = 8 * doubleArrayTotalSize;

            long byteArrayMaxSize = 2_147_483_591;
            var noError = new byte[byteArrayMaxSize];

            var error = new byte[byteArrayTotalSize]; // 2_456_507_088
            //Buffer.BlockCopy(matrix, 0, buffer, 0, buffer.Length);
            //reader.Write(buffer, 0, buffer.Length);

            Console.WriteLine("done!");
        }