Quansight-Labs / numpy.net

A port of NumPy to .Net
BSD 3-Clause "New" or "Revised" License
128 stars 14 forks source link

np.load, np.save missing #48

Open Fokatu opened 1 year ago

Fokatu commented 1 year ago

Is it possible to add these two functions to load and save npy file? Thanks!

KevinBaselinesw commented 1 year ago

I think I looked at np.load and np.save before. It is a very big task to implement these and I didn't have time to do it. Nobody has asked for them in 5 years so they are not the most commonly used functions.

In python/numpy there is a very significant performance improvement by saving/loading the arrays with those functions because they are written in C and not python. My view is that an application user of numpydotnet should probably write their own specific load/save functions. Since it would be in .NET either way, there is no performance to be gained by replicating these functions in the library.

Can you implement your own load and save functions?
Does the format need to be 100% compatible with python/numpy? Are you exchanging files with python/numpy?

FYI, we do support np.tofile and np.fromfile

If you really need to be 100% compatible with the python file format, let me know. I can also offer tips on accessing any arrays you want to save to file.

Also, we recently added support for serializing ndarrays. See ndarray.ToSerializable(). It will return a class object that can be serialized to a string and written to a disk file.

rainyl commented 1 year ago

I think it is not necessary to add .npy support for numpy.net, even as a python numpy user, I hardly use np.save or np.load, I prefer text file like .csv, if higher performance needed, I think feather is a better choice.

rainyl commented 1 year ago

@KevinBaselinesw I noticed that you are working on np.load for binary files, an advice, text files and binary files can use different api, for example, numpy use np.loadtxt for text file and np.load for binary file, which will be more clear and direct.

PS: the current np.fromfile in this project is also not complete, e.g., skip_rows, comments are useful when the input file has extra info, but they are not included. np.loadtxt()

Finally, I REALLY appreciate your works, I explored many multi-dimensional array libraries for C#, such as NumSharp, Numpy.NET, Tensor.NET from SciSharp, thanks for their efforts, but they are not convenient and complete as this work, TorchSharp is great but I don't want to install such a huge library just for multi-dimension array calculation.

Thanks for your work again!!! 😄

KevinBaselinesw commented 1 year ago

@rainyl I have tried to port np.load() but it is very complicated and too much work.

I recommend that people use the newly added .ToSerializable() method. This is probably a more modern way to save, share and restore data structures in a .NET application. The serializable data structures can then be converted to json or XML for writing to a file, database or network API. But of course it is not binary compatible with the python output of np.save() and np.load()

       [TestMethod]
        public void test_ndarray_serialization_newtonsoft()
        {
            var a = np.array(new int[] { 0, 1, 2, 3, 4, 5, 6, 7, 8 }).reshape(3,3);
            AssertArray(a, new int[,] { { 0, 1, 2 }, { 3, 4, 5 }, { 6, 7, 8 } });

            var A_ArraySerializedFormat = a.ToSerializable();   <-these are equivalent 
            A_ArraySerializedFormat = np.ToSerializable(a);  <-these are equivalent

            var A_Serialized = SerializationHelper.SerializeNewtonsoftJSON(A_ArraySerializedFormat);
            var A_Deserialized = SerializationHelper.DeSerializeNewtonsoftJSON<ndarray_serializable>(A_Serialized);

            Console.WriteLine("AA");
            print(A_Serialized);

            var b = new ndarray(A_Deserialized);  <- restores the serialized ndarray

            var B_ArraySerializedFormat = b.ToSerializable();
            var B_Serialized = SerializationHelper.SerializeNewtonsoftJSON(B_ArraySerializedFormat);
            var B_Deserialized = SerializationHelper.DeSerializeNewtonsoftJSON<ndarray_serializable>(B_Serialized);
            Console.WriteLine("\n\nBB");
            print(B_Serialized);

            Assert.AreEqual(0, string.Compare(A_Serialized, B_Serialized));
            Assert.AreEqual(a.Dtype.TypeNum, b.Dtype.TypeNum);
            Assert.AreEqual(a.Dtype.str, b.Dtype.str);
            Assert.AreEqual(a.Dtype.alignment, b.Dtype.alignment);
            Assert.AreEqual(a.Dtype.ElementSize, b.Dtype.ElementSize);
            Assert.AreEqual(a.Dtype.Kind, b.Dtype.Kind);

        }

    public static class SerializationHelper
    {
        public static T DeserializeXml<T>(this string toDeserialize)
        {
            System.Xml.Serialization.XmlSerializer xmlSerializer = new System.Xml.Serialization.XmlSerializer(typeof(T));
            using (System.IO.StringReader textReader = new System.IO.StringReader(toDeserialize))
            {
                return (T)xmlSerializer.Deserialize(textReader);
            }
        }

        public static string SerializeXml<T>(this T toSerialize)
        {
            System.Xml.Serialization.XmlSerializer xmlSerializer = new System.Xml.Serialization.XmlSerializer(typeof(T));
            using (System.IO.StringWriter textWriter = new System.IO.StringWriter())
            {
                xmlSerializer.Serialize(textWriter, toSerialize);
                return textWriter.ToString();
            }
        }

        public static string SerializeNewtonsoftJSON<T>(this T toSerialize)
        {
            return Newtonsoft.Json.JsonConvert.SerializeObject(toSerialize);
        }

        public static T DeSerializeNewtonsoftJSON<T>(this string toDeserialize)
        {
            return Newtonsoft.Json.JsonConvert.DeserializeObject<T>(toDeserialize);
        }

    }
rainyl commented 1 year ago

Yes, you are right, for saving files, .ToSerializable() is a better choice. Considering the complexity, I think it's not necessary to support np.load, at least with lower priority. Anyway, it's only a immature suggestion from a new user, thanks for your reply :)

GeorgeS2019 commented 1 year ago

@KevinBaselinesw

Could you evaluate if these codes are useful "Starting Ideas" to address the missing np.load and np.save functions

https://github.com/SciSharp/NumSharp/blob/master/src/NumSharp.Core/APIs/np.load.cs

Less relevant, but FYI https://github.com/SciSharp/Numpy.NET/blob/main/src/Numpy/np.io.gen.cs