SciSharp / Numpy.NET

C#/F# bindings for NumPy - a fundamental library for scientific computing, machine learning and AI
Other
690 stars 97 forks source link

Suggestions regarding np.linalg.norm, np.random.normal and GetData #48

Closed chriss2401 closed 2 years ago

chriss2401 commented 4 years ago

Hello,

First of all, congrats on the great job of binding numpy so well. This is by far the best c# version of numpy I have encountered, and it was relatively easy to get things going (in combination with the documentation). After using this for a project I am working on, I have a few minor comments that I thought would make sense to write here:

np.linalg.norm

Works as expected, but only has only constructor. if I want to do something like: np.linalg.norm(c, axis=1) using Numpy.NET now I have to write np.linalg.norm(c, null, new int[]{1}) which is maybe not the most ideal. Multiple constructors would be nice so axis could either be an int[] or just an int.

np.random.normal

Input arguments are also not the most robust. If I want to do something like:

np.random.normal(3, 2.5, c.shape) using Numpy.NET now I have to write np.random.normal(new NDarray<float>(new[] { 3 }), new NDarray<float>(new[] { 2.5 }), new[]{cshape[0], c.shape[1]}); . Would be nice ideally if we could just pass floats and ideally also a shape objects. From my understanding the variety of constructors are also limited.

GetData

Regarding GetData, unfortunately if I have a numpy array with shape of NxM where M>1 then I always just get a one dim vector. Would be nice if we could return an array based on a given shape. So my current workaround is to get the .repr of the element from the NDarray and then convert it to Single/Double/Int32.

Thanks again for this nice binding package!

henon commented 4 years ago

Hi Chris, thanks for the high praise, I do appreciate it.

The reason for the first two problems (missing overloads) is that the whole API is automatically generated by parsing the numpy documentation which doesn't specify exactly what kinds of values can be passed in. So as users find such missing overloads I gradually add them. For instance, it would say a parameter is "array_like" and I am now slowly finding out that that doesn't only mean ndarrays but also tuples of ndarrays in many cases.

I'll add overloads for np.linalg.norm and np.random.normal, no problem.

As for the probelm with GetData, I haven't fully understood it yet. Could you open a separate issue about it and provide a code sample so I can reproduce it?

Thanks for contributing back!

PS: What is your project and can I add it to the list of projects using Numpy.NET?

henon commented 4 years ago

Oh, I get your GetData point now. GetData returns the underlying data as a 1D array always regardless of the shape of the NDarray. Numpy's data representation at the low level is always a 1D array. Copying that data all in one piece is just the most performant way of getting data from Python to C#. But we could of course implement a function to copy data into a multi-dimensional C# array using slicing and copying slices one by one. Would you like to implement it?

henon commented 4 years ago

Or you could just use my library SliceAndDice to access the data multi-dimensionally.

henon commented 4 years ago

I added more overloads for linalg. Check it out, this passes now: using LA = Numpy.np.linalg;

     [TestMethod]
        public void normTest()
        {
            // >>> from numpy import linalg as LA
            // >>> a = np.arange(9) - 4
            // >>> a
            // array([-4, -3, -2, -1,  0,  1,  2,  3,  4])
            // >>> b = a.reshape((3, 3))
            // >>> b
            // array([[-4, -3, -2],
            //        [-1,  0,  1],
            //        [ 2,  3,  4]])
            // 
            var a = np.arange(9) - 4;
            NDarray given = a;
            var expected =
                "array([-4, -3, -2, -1,  0,  1,  2,  3,  4])";
            Assert.AreEqual(expected, given.repr);
            var b = a.reshape(3, 3);
            given = b;
            expected =
                "array([[-4, -3, -2],\n" +
                "       [-1,  0,  1],\n" +
                "       [ 2,  3,  4]])";
            Assert.AreEqual(expected, given.repr);

            // >>> LA.norm(a)
            // 7.745966692414834
            // >>> LA.norm(b)
            // 7.745966692414834
            // >>> LA.norm(b, 'fro')
            // 7.745966692414834
            // >>> LA.norm(a, np.inf)
            // 4.0
            // >>> LA.norm(b, np.inf)
            // 9.0
            // >>> LA.norm(a, -np.inf)
            // 0.0
            // >>> LA.norm(b, -np.inf)
            // 2.0
            // 

            Assert.GreaterOrEqual(7.74596669f, (float)LA.norm(a));
            Assert.GreaterOrEqual(7.74596669f, (float)LA.norm(b));
            Assert.GreaterOrEqual(7.74596669f, LA.norm(b, "fro"));
            Assert.AreEqual(4, LA.norm(a, Constants.inf));
            Assert.AreEqual(9, LA.norm(b, Constants.inf));
            Assert.AreEqual(0, LA.norm(a, Constants.neg_inf));
            Assert.AreEqual(2, LA.norm(b, Constants.neg_inf));

            // >>> LA.norm(a, 1)
            // 20.0
            // >>> LA.norm(b, 1)
            // 7.0
            // >>> LA.norm(a, -1)
            // -4.6566128774142013e-010
            // >>> LA.norm(b, -1)
            // 6.0
            // >>> LA.norm(a, 2)
            // 7.745966692414834
            // >>> LA.norm(b, 2)
            // 7.3484692283495345
            // 

            Assert.AreEqual(20f, (float)LA.norm(a, 1));
            Assert.AreEqual(7f, (float)LA.norm(b, 1));
            Assert.GreaterOrEqual(0f, (float)LA.norm(a, -1));
            Assert.GreaterOrEqual(6, (float)LA.norm(b, -1));
            Assert.GreaterOrEqual(7.74596669f, (float)LA.norm(a, 2));
            Assert.GreaterOrEqual(7.34846922f, (float)LA.norm(b, 2));

            // >>> LA.norm(a, -2)
            // 0.0
            // >>> LA.norm(b, -2)
            // 1.8570331885190563e-016
            // >>> LA.norm(a, 3)
            // 5.8480354764257312
            // >>> LA.norm(a, -3)
            // 0.0
            // 
            Assert.AreEqual(0f, (float)LA.norm(a, -2));
            Assert.AreEqual(1.8570331885190563e-016f, (float)LA.norm(b, -2));
            Assert.AreEqual(5.8480354764257312f, (float)LA.norm(a, 3));
            Assert.AreEqual(0f, (float)LA.norm(a, -3));

            // Using the axis argument to compute vector norms:

            // >>> c = np.array([[ 1, 2, 3],
            // ...               [-1, 1, 4]])
            // >>> LA.norm(c, axis=0)
            // array([ 1.41421356,  2.23606798,  5.        ])
            // >>> LA.norm(c, axis=1)
            // array([ 3.74165739,  4.24264069])
            // >>> LA.norm(c, ord=1, axis=1)
            // array([ 6.,  6.])
            // 
            var c = np.array(new[,]{{ 1, 2, 3},{-1, 1, 4}});
            given=  LA.norm(c, axis:0);
            expected=
                "array([1.41421356, 2.23606798, 5.        ])";
            Assert.AreEqual(expected, given.repr);
            given=  LA.norm(c, axis:1);
            expected=
                "array([3.74165739, 4.24264069])";
            Assert.AreEqual(expected, given.repr);
            given=  LA.norm(c, ord:1, axis:1);
            expected=
                "array([6., 6.])";
            Assert.AreEqual(expected, given.repr);

            // Using the axis argument to compute matrix norms:

            // >>> m = np.arange(8).reshape(2,2,2)
            // >>> LA.norm(m, axis=(1,2))
            // array([  3.74165739,  11.22497216])
            // >>> LA.norm(m[0, :, :]), LA.norm(m[1, :, :])
            // (3.7416573867739413, 11.224972160321824)
            // 
            var m = np.arange(8).reshape(2,2,2);
            given=  LA.norm(m, axis: new[]{1,2});
            expected=
                "array([ 3.74165739, 11.22497216])";
            Assert.AreEqual(expected, given.repr);
            var given1= new[]{ LA.norm(m["0, :, :"]), LA.norm(m["1, :, :"])};
            expected=
                "(3.7416573867739413, 11.224972160321824)";
            Assert.AreEqual(expected, given1.repr());

        }
henon commented 4 years ago

also added overloads for np.random.normal. this testcase passes:

        [TestMethod]
        public void normalTest()
        {
            // Draw samples from the distribution:

            // >>> mu, sigma = 0, 0.1 # mean and standard deviation
            // >>> s = np.random.normal(mu, sigma, 1000)
            // 
            var (mu, sigma) = (0.0f, 0.1f); // mean and standard deviation;
            var s = np.random.normal(mu, sigma, 1000);

            // Verify the mean and the variance:

            // >>> abs(mu - np.mean(s)) < 0.01
            // True
            // 
            Assert.IsTrue( Math.Abs(mu - np.mean(s)) < 0.01);

             // >>> abs(sigma - np.std(s, ddof=1)) < 0.01
            // True
            // 
             Assert.IsTrue(Math.Abs(sigma - np.std(s, ddof: 1)) < 0.01);

             // Two-by-four array of samples from N(3, 6.25):

             // >>> np.random.normal(3, 2.5, size = (2, 4))
             // array([[-4.49401501, 4.00950034, -1.81814867, 7.29718677],   # random
             //    [ 0.39924804,  4.68456316,  4.99394529,  4.84057254]])  # random

             Assert.AreEqual(new Shape(2,4), np.random.normal(3, 2.5f, new []{2, 4}).shape );
        }
chriss2401 commented 4 years ago

Hey @henon ,

Thanks for adding the overloads, that is great! Just FYI I also noticed a similar behavior with np.mean ( have to write np.mean(x, new[] { 0 }) instead of np.mean(x, axis:0))

Regarding GetData, I will try to find some time this or next week to make a branch and implement it.

Unfortunately the project I am working on is one related to a private company, so I am not sure whether I can disclose any information about it. If I can though (for this or future projects) I will let you know so you can add it to the list :)

C.

henon commented 4 years ago

This same problem affects almost all statistics functions and many others. I found a solution for all by changing the parameter axis to type Axis. Parameter axis can by assigned null, int or int[] due to implicit cast operators, thus allowing all possibilities without overloads.

This test passes now:

    [TestMethod]
        public void meanTest()
        {
            // >>> a = np.array([[1, 2], [3, 4]])
            // >>> np.mean(a)
            // 2.5
            // >>> np.mean(a, axis=0)
            // array([ 2.,  3.])
            // >>> np.mean(a, axis=1)
            // array([ 1.5,  3.5])
            // 

            NDarray a = np.array(new [,]{{1, 2}, {3, 4}});
            var given_scalar=  np.mean(a);
            Assert.AreEqual(2.5, given_scalar);
            var given=  np.mean(a, axis:0);
            var expected=
                "array([2., 3.])";
            Assert.AreEqual(expected, given.repr);
            given=  np.mean(a, axis:1);
            expected=
                "array([1.5, 3.5])";
            Assert.AreEqual(expected, given.repr);

            // In single precision, mean can be inaccurate:

            // >>> a = np.zeros((2, 512*512), dtype=np.float32)
            // >>> a[0, :] = 1.0
            // >>> a[1, :] = 0.1
            // >>> np.mean(a)
            // 0.54999924
            // 

             a = np.zeros(new Shape(2, 512*512), dtype: np.float32);
             a["0, :"] = (NDarray)1.0;
             a["1, :"] = (NDarray)0.1;
             given_scalar= Math.Round( np.mean(a), 8);
             var expected_scalar=
                0.54999924;
            Assert.AreEqual(expected_scalar, given_scalar);

            // Computing the mean in float64 is more accurate:

            // >>> np.mean(a, dtype=np.float64)
            // 0.55000000074505806
            // 

             given_scalar=  np.mean(a, dtype: np.float64);
             expected_scalar=
                0.55000000074505806;
            Assert.AreEqual(expected_scalar, given_scalar);
        }
henon commented 4 years ago

released as v19 on nuget