Quansight-Labs / numpy.net

A port of NumPy to .Net
BSD 3-Clause "New" or "Revised" License
128 stars 14 forks source link

[Requirement]Support serialization #46

Closed ChengYen-Tang closed 1 year ago

ChengYen-Tang commented 1 year ago

I want ndarray, shape, np.random, dtype, serializable using system.text.json https://learn.microsoft.com/en-us/dotnet/api/system.text.json?view=net-7.0

Thanks

KevinBaselinesw commented 1 year ago

I like the idea of being able to serialize the ndarray objects but it is going to take some work. I have attached a quick sample application serializes the shape class. I am testing it with both System.Text.Json and NewtonSoft.Json.

Note: System.Text.Json is only supported on later versions of .NET. I build numpydotnet for .net standard 2.0 which does not support this. I think there are users of the library that can't move to a new version of the .NET so it can't be built into the library.

what I propose is modifying the various objects so that the application can successfully be serialized by at least these two most commonly used serializers. We probably should ensure that XML can properly serialize it as well.

Does this like a good approach for your needs?

NumpSerializationTest.zip

ChengYen-Tang commented 1 year ago

Let me think about it again, because another framework I use is to use System.Text.Json to serialize

Set latest in csproj https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/configure-language-version

ChengYen-Tang commented 1 year ago

@KevinBaselinesw This seems to be a solution. https://learn.microsoft.com/en-us/dotnet/standard/serialization/system-text-json/migrate-from-newtonsoft?pivots=dotnet-7-0

KevinBaselinesw commented 1 year ago

I pushed up a new release with support for serialization. It seems to work well for Newtonsoft.JSON, XML and your choice of System.Text.JSON as indicated in the sample below.

using System;
using System.Collections.Generic;
using System.Drawing;
using System.Text;
using System.Linq;
using Newtonsoft;
using npy_intp = System.Int64;
using NumpyDotNet;

namespace ConsoleApp2
{
    internal class Program
    {
        static void Main(string[] args)
        {
            shape a = new shape(2, 3, 4, 5);

            System.Text.Json.JsonSerializerOptions options = new System.Text.Json.JsonSerializerOptions();
            options.IncludeFields = true;
            options.DefaultIgnoreCondition = System.Text.Json.Serialization.JsonIgnoreCondition.WhenWritingNull;

            string jsonString = System.Text.Json.JsonSerializer.Serialize(a);
            Console.WriteLine(jsonString);

            jsonString = System.Text.Json.JsonSerializer.Serialize(a, options);
            Console.WriteLine(jsonString);

            string jsonString2 = Newtonsoft.Json.JsonConvert.SerializeObject(a);
            Console.WriteLine(jsonString2);

            shape b = System.Text.Json.JsonSerializer.Deserialize<shape>(jsonString);
            shape c = Newtonsoft.Json.JsonConvert.DeserializeObject<shape>(jsonString2);

            ndarray aa = np.array(new int[] {0,1,2,3,4,5,6,7,8}).reshape(3, 3);
            var x = System.Text.Json.JsonSerializer.Serialize(aa.ToSerializable(), options);
            Console.WriteLine("AA");
            Console.WriteLine(x);

            var x1 = System.Text.Json.JsonSerializer.Deserialize<ndarray_serializable>(x, options);

            ndarray bb = np.FromSerializable(x1);
            var y = System.Text.Json.JsonSerializer.Serialize(bb.ToSerializable(), options);
            Console.WriteLine("\n\nBB");
            Console.WriteLine(y);

            int IsSame = string.Compare(x, y);
            if (IsSame != 0)
            {
                Console.WriteLine("Bad conversion");

                for (int i = 0; i < x.Length; i++)
                {
                    if (x[i] != y[i])
                    {
                        Console.WriteLine(i.ToString());
                    }
                }

            }

            Console.Read();

        }
    }

}
KevinBaselinesw commented 1 year ago

Here are some unit tests showing serialization via newtonsoft and XML.

[TestMethod]
    public void test_dtype_serialization_newtonsoft()
    {
        var a = np.arange(9).reshape(3, 3);
        AssertArray(a, new int[,] { { 0, 1, 2 }, { 3, 4, 5 }, { 6, 7, 8 } });

        var A_DtypeSerializedFormat = a.Dtype.ToSerializable();

        var A_Serialized = SerializationHelper.SerializeNewtonsoftJSON(A_DtypeSerializedFormat);
        var A_Deserialized = SerializationHelper.DeSerializeNewtonsoftJSON<dtype_serializable>(A_Serialized);

        dtype b = new dtype(A_Deserialized);

        Assert.AreEqual(a.Dtype.TypeNum, b.TypeNum);
        Assert.AreEqual(a.Dtype.str, b.str);
        Assert.AreEqual(a.Dtype.alignment, b.alignment);
        Assert.AreEqual(a.Dtype.ElementSize, b.ElementSize);
        Assert.AreEqual(a.Dtype.Kind, b.Kind);

    }

    [TestMethod]
    public void test_dtype_serialization_XML()
    {
        var a = np.arange(9).reshape(3, 3);
        AssertArray(a, new int[,] { { 0, 1, 2 }, { 3, 4, 5 }, { 6, 7, 8 } });

        dtype_serializable A_DtypeSerializedFormat = np.ToSerializable(a.Dtype);

        var A_Serialized = SerializationHelper.SerializeXml(A_DtypeSerializedFormat);
        var A_Deserialized = SerializationHelper.DeserializeXml<dtype_serializable>(A_Serialized);

        dtype b = np.FromSerializable(A_Deserialized);

        Assert.AreEqual(a.Dtype.TypeNum, b.TypeNum);
        Assert.AreEqual(a.Dtype.str, b.str);
        Assert.AreEqual(a.Dtype.alignment, b.alignment);
        Assert.AreEqual(a.Dtype.ElementSize, b.ElementSize);
        Assert.AreEqual(a.Dtype.Kind, b.Kind);

    }

    [TestMethod]
    public void test_ndarray_serialization_newtonsoft()
    {
        var a = np.array(new int[] { 0, 1, 2, 3, 4, 5, 6, 7, 8 }).reshape(3,3);
        AssertArray(a, new int[,] { { 0, 1, 2 }, { 3, 4, 5 }, { 6, 7, 8 } });

        var A_ArraySerializedFormat = a.ToSerializable();
        var A_Serialized = SerializationHelper.SerializeNewtonsoftJSON(A_ArraySerializedFormat);
        var A_Deserialized = SerializationHelper.DeSerializeNewtonsoftJSON<ndarray_serializable>(A_Serialized);

        Console.WriteLine("AA");
        print(A_Serialized);

        var b = new ndarray(A_Deserialized);

        var B_ArraySerializedFormat = b.ToSerializable();
        var B_Serialized = SerializationHelper.SerializeNewtonsoftJSON(B_ArraySerializedFormat);
        var B_Deserialized = SerializationHelper.DeSerializeNewtonsoftJSON<ndarray_serializable>(B_Serialized);
        Console.WriteLine("\n\nBB");
        print(B_Serialized);

        Assert.AreEqual(0, string.Compare(A_Serialized, B_Serialized));
        Assert.AreEqual(a.Dtype.TypeNum, b.Dtype.TypeNum);
        Assert.AreEqual(a.Dtype.str, b.Dtype.str);
        Assert.AreEqual(a.Dtype.alignment, b.Dtype.alignment);
        Assert.AreEqual(a.Dtype.ElementSize, b.Dtype.ElementSize);
        Assert.AreEqual(a.Dtype.Kind, b.Dtype.Kind);

    }

    [TestMethod]
    public void test_ndarray_serialization_XML()
    {
        var a = np.array(new int[] { 0, 1, 2, 3, 4, 5, 6, 7, 8 }).reshape(3, 3);
        AssertArray(a, new int[,] { { 0, 1, 2 }, { 3, 4, 5 }, { 6, 7, 8 } });

        var A_ArraySerializedFormat = a.ToSerializable();
        var A_Serialized = SerializationHelper.SerializeXml(A_ArraySerializedFormat);
        var A_Deserialized = SerializationHelper.DeserializeXml<ndarray_serializable>(A_Serialized);

        Console.WriteLine("AA");
        print(A_Serialized);

        var b = new ndarray(A_Deserialized);

        var B_ArraySerializedFormat = b.ToSerializable();
        var B_Serialized = SerializationHelper.SerializeXml(B_ArraySerializedFormat);
        var B_Deserialized = SerializationHelper.DeserializeXml<ndarray_serializable>(B_Serialized);
        Console.WriteLine("\n\nBB");
        print(B_Serialized);

        //Assert.AreEqual(0, string.Compare(A_Serialized, B_Serialized));
        Assert.AreEqual(a.Dtype.TypeNum, b.Dtype.TypeNum);
        Assert.AreEqual(a.Dtype.str, b.Dtype.str);
        Assert.AreEqual(a.Dtype.alignment, b.Dtype.alignment);
        Assert.AreEqual(a.Dtype.ElementSize, b.Dtype.ElementSize);
        Assert.AreEqual(a.Dtype.Kind, b.Dtype.Kind);

    }``
ChengYen-Tang commented 1 year ago

However, if the ndarray is wrapped as an attribute of a object, and I want to serialize this object, how can I do it.

public class Test
{
    public ndarray Array { get; set; }
}

Test test = new();
string jsonString = System.Text.Json.JsonSerializer.Serialize(test, options);
ChengYen-Tang commented 1 year ago

I will try this method later, thanks https://learn.microsoft.com/en-us/dotnet/standard/serialization/system-text-json/converters-how-to?pivots=dotnet-7-0

KevinBaselinesw commented 1 year ago

Unfortunately I could not come up with a way to serialize the ndarray object in a way that would allow me to recreate the ndarray on deserialization. The data structures are very complex and twisted inside the library. There are lots of pointers to functions that get set up based on the data type. That is how numpy is implemented in python. I ported how it is implemented.

ChengYen-Tang commented 1 year ago

Can np.random support serialization?

KevinBaselinesw commented 1 year ago

I just pushed up a new version that supports serialization for the np.random. Note: It works for the built in RandomState algorithm. If you are using a custom random generator, you will need to implement a couple of new APIs to enable serialization of that algorithm state. The sample custom random generator in my unit tests does not support serialization because I don't have access to the internals of the .NET random generator.

Here are some sample unit tests with the new functionality:

    [TestMethod]
        public void test_nprandom_serialization_newtonsoft()
        {
            var Rand1 = new np.random();
            Rand1.seed(1234);

            var Rand1Serialized = SerializationHelper.SerializeNewtonsoftJSON(Rand1.ToSerialization());
            print(Rand1Serialized);

            double fr = Rand1.randn();
            print(fr);
            Assert.AreEqual(0.47143516373249306, fr);
            fr = Rand1.randn();
            print(fr);
            Assert.AreEqual(-1.1909756947064645, fr);

            var Rand1Deserialized = SerializationHelper.DeSerializeNewtonsoftJSON<np.random_serializable>(Rand1Serialized);
            var Rand2 = new np.random();
            Rand2.FromSerialization(Rand1Deserialized);
            fr = Rand2.randn();
            print(fr);
            Assert.AreEqual(0.47143516373249306, fr);
            fr = Rand2.randn();
            print(fr);
            Assert.AreEqual(-1.1909756947064645, fr);

            Rand1Serialized = SerializationHelper.SerializeNewtonsoftJSON(Rand1.ToSerialization());
            print(Rand1Serialized);

            var Rand2Serialized = SerializationHelper.SerializeNewtonsoftJSON(Rand2.ToSerialization());
            print(Rand2Serialized);

            Assert.AreEqual(0, string.Compare(Rand1Serialized, Rand2Serialized));

        }

        [TestMethod]
        public void test_nprandom_serialization_xml()
        {
            var Rand1 = new np.random();
            Rand1.seed(1234);

            var Rand1Serialized = SerializationHelper.SerializeXml(Rand1.ToSerialization());
            print(Rand1Serialized);

            double fr = Rand1.randn();
            print(fr);
            Assert.AreEqual(0.47143516373249306, fr);
            fr = Rand1.randn();
            print(fr);
            Assert.AreEqual(-1.1909756947064645, fr);

            var Rand1Deserialized = SerializationHelper.DeserializeXml<np.random_serializable>(Rand1Serialized);
            var Rand2 = new np.random();
            Rand2.FromSerialization(Rand1Deserialized);
            fr = Rand2.randn();
            print(fr);
            Assert.AreEqual(0.47143516373249306, fr);
            fr = Rand2.randn();
            print(fr);
            Assert.AreEqual(-1.1909756947064645, fr);

            Rand1Serialized = SerializationHelper.SerializeXml(Rand1.ToSerialization());
            print(Rand1Serialized);

            var Rand2Serialized = SerializationHelper.SerializeXml(Rand2.ToSerialization());
            print(Rand2Serialized);

            Assert.AreEqual(0, string.Compare(Rand1Serialized, Rand2Serialized));

        }

        [TestMethod]
        public void test_nprandom_serialization_newtonsoft_2()
        {
            var Rand1 = new np.random();
            Rand1.seed(701);
            ndarray arr1 = Rand1.randint(2, 3, new shape(4), dtype: np.Int32);

            var Rand1Serialized = SerializationHelper.SerializeNewtonsoftJSON(Rand1.ToSerialization());
            var Rand1Deserialized = SerializationHelper.DeSerializeNewtonsoftJSON<np.random_serializable>(Rand1Serialized);
            var Rand2 = new np.random();
            Rand2.FromSerialization(Rand1Deserialized);

            ndarray arr = Rand1.randint(9, 128000, new shape(5000000), dtype: np.Int32);
            Assert.AreEqual(arr.TypeNum, NPY_TYPES.NPY_INT32);
            var amax = np.amax(arr);
            Assert.AreEqual((Int32)127999, amax.GetItem(0));

            arr = Rand2.randint(9, 128000, new shape(5000000), dtype: np.Int32);
            Assert.AreEqual(arr.TypeNum, NPY_TYPES.NPY_INT32);
            amax = np.amax(arr);
            Assert.AreEqual((Int32)127999, amax.GetItem(0));

            Rand1Serialized = SerializationHelper.SerializeNewtonsoftJSON(Rand1.ToSerialization());
            print(Rand1Serialized);

            var Rand2Serialized = SerializationHelper.SerializeNewtonsoftJSON(Rand2.ToSerialization());
            print(Rand2Serialized);

            Assert.AreEqual(0, string.Compare(Rand1Serialized, Rand2Serialized));

        }
ChengYen-Tang commented 1 year ago

Ok, Thank you.

ChengYen-Tang commented 1 year ago

I get this error. When I serializable ndarray. My array shape is (2, 2) and value all is np.inf.

JsonSerializer.Serialize(value.ToSerializable(), options)

System.ArgumentException: '.NET number values such as positive and negative infinity cannot be written as valid JSON. To make it work when using 'JsonSerializer', consider specifying 'JsonNumberHandling.AllowNamedFloatingPointLiterals' (see https://docs.microsoft.com/dotnet/api/
KevinBaselinesw commented 1 year ago

This is not a numpydotnet issue. It is a serialization issue that can be resolved by enabling AllowNamedFloatingPointLiterals as below. Note, this seems to work without any changes in Newtonsoft. It should be noted that the JSON conversion of infinity values may not be supported by different JSON implementations.

```
    System.Text.Json.JsonSerializerOptions options = new System.Text.Json.JsonSerializerOptions();
        options.IncludeFields = true;
        options.DefaultIgnoreCondition = System.Text.Json.Serialization.JsonIgnoreCondition.WhenWritingNull;
        options.NumberHandling = System.Text.Json.Serialization.JsonNumberHandling.AllowNamedFloatingPointLiterals;

        ndarray aa = np.array(new double[] {double.NegativeInfinity,double.PositiveInfinity, double.NegativeInfinity, 
                                            double.PositiveInfinity, double.NegativeInfinity, double.PositiveInfinity, 
                                            double.NegativeInfinity, double.PositiveInfinity, double.NegativeInfinity }).reshape(3, 3);
        var x = System.Text.Json.JsonSerializer.Serialize(aa.ToSerializable(), options);
ChengYen-Tang commented 1 year ago

Ooh!! Thanks for helping me out