Quansight-Labs / numpy.net

A port of NumPy to .Net
BSD 3-Clause "New" or "Revised" License
131 stars 14 forks source link

Matmul error - shapes (4,) and (4, 4) not aligned! #11

Closed AsiaMartini closed 3 years ago

AsiaMartini commented 3 years ago

Sorry to bother you, I found another difference with Numpy.NET results, maybe a bug. Your library gives me error, while the other gives me the result.

Code with your library:

public static double _get_lambda_next(ndarray am, ndarray bs, ndarray bm, ndarray cs, ndarray cm, ndarray rq)
        {
            Console.WriteLine($"\n rq :\n{rq}");
            Console.WriteLine($"\n rq.T :\n{rq.T}");
            Console.WriteLine($"\n am :\n{am}");
            var temp1 = np.matmul(rq.T, am);
            Console.WriteLine($"\n np.matmul(rq.T, am) :\n{temp1}"); 

            var expr_1 = np.matmul(np.matmul(rq.T, am), rq);
            var expr_2 = (1 / cs) * np.matmul(np.matmul(np.matmul(rq.T, bm.T), cm), rq);
            var expr_3 = (1 / cs) * np.matmul(np.matmul(np.matmul(rq.T, cm.T), cm), rq);
            var lambda_next = (expr_1 - expr_2) / (bs - expr_3);
            return getFloatValue((ndarray)lambda_next);
        }

Code with Numpy.NET:

public static float _get_lambda_next(NDarray am, NDarray bs, NDarray bm, NDarray cs, NDarray cm, NDarray rq) {

            Console.WriteLine($"\n rq :\n{rq}"); 
            Console.WriteLine($"\n rq.T :\n{rq.T}");
            Console.WriteLine($"\n am :\n{am}");
            var temp1 = np.matmul(rq.T, am);
            Console.WriteLine($"\n np.matmul(rq.T, am) :\n{temp1}");

            var expr_1 = np.matmul(np.matmul(rq.T, am), rq);
            var expr_2 = (1 / cs) * np.matmul(np.matmul(np.matmul(rq.T, bm.T), cm), rq);
            var expr_3 = (1 / cs) * np.matmul(np.matmul(np.matmul(rq.T, cm.T), cm), rq);
            var lambda_next = (expr_1 - expr_2) / (bs - expr_3);
            return getFloatValue(lambda_next);
        }

You library's results to the left, the Numpy.NET ones to the right: image

Here the error your library gives: image

KevinBaselinesw commented 3 years ago

I fixed this bug in a new release 9.72.

AsiaMartini commented 3 years ago

Works like a charm, thanks!

AsiaMartini commented 3 years ago

Here we are again.

This time the matmul is giving me NullReferenceException:

image image

Same project, function _get_abc_matrices.

var trans = np.transpose(m1, new long[] { 0, 2, 1 });
var temp = np.matmul(trans, m2);

This is the dataset: image image

Sorry for the screenshot, but I'm working on Hololens and the glasses don't want to collaborate today... no logs file outputted -.-'

KevinBaselinesw commented 3 years ago

what are the dimensions/shape of the trans and m2 ndarrays?

It is hard to tell from the pictures.

AsiaMartini commented 3 years ago
 m2.shape :
(3, 4, 4)

 trans.shape :
(3, 4, 4)

 m2.ndim :
3

 trans.ndim :
3
KevinBaselinesw commented 3 years ago

So far, I can't repro the crash using shapes of (3,4,4).

what is the shape of m1 as it is passed to np.transpose? Maybe something weird is happening with transpose that causes problems in np.matmul.

KevinBaselinesw commented 3 years ago

Just curious, where do you work that you are working hololens?

AsiaMartini commented 3 years ago
ml : 
FL0AT 
{ { { 0.0f, 0,3246065.0f, -1,870714.0f, 11,06388.0f }, 
    { -0,3246065.0f, 0.0f, 11,06388.0f, 1,870714.0f }, 
    { 1,870714.0f, -11,06388.0f, 0.0f, 0,3246065.0f }, 
    { -11,06388.0f, -1.870714.0f, -0,3246065.0f, 0.0f } }, 

  { { 0.0f, 0,2652016.0f, -0,4539047.0f, 11,58219.0f }, 
    { -0,2652016.0f, 0.0f, 11,58219.0f, 0,4539047.0f }, 
    { 0,4539047.0f, -11,58219.0f, 0.0f, 0,2652016.0f }, 
    { -11,58219.0f, -0,4539047.0f, -0,2652016.0f, 0.0f } }, 

  { { 0.0f, 0,1204867.0f, -4,630829.0f, 11,70588.0f }, 
    { -0,1204867.0f, 0.0f, 11,70588.0f, 4,630829.0f }, 
    { 4,630829.0f, -11,70588.0f, 0.0f, 0,1204867.0f }, 
    { -11,70588.0f, -4,630829.0f, -0,1204867.0f, 0.0f } } } 

I work for an Italian company, in the geolocalization business.

KevinBaselinesw commented 3 years ago

Something is not right here. Is this generated from numpy.net or numpydotnet?

This line is wrong. It does not compile because -1.870714.0f is not a valid number. The row needs 7 numbers, not 6.

-11,06388.0f, -1.870714.0f, -0,3246065.0f, 0.0f.

AsiaMartini commented 3 years ago

I cant' understand what do you mean...

KevinBaselinesw commented 3 years ago

see if this compiles:

float x = -1.870714.0f;

Or this:

var y = float[,,] { { { 0.0f, 0,3246065.0f, -1,870714.0f, 11,06388.0f }, { -0,3246065.0f, 0.0f, 11,06388.0f, 1,870714.0f }, { 1,870714.0f, -11,06388.0f, 0.0f, 0,3246065.0f }, { -11,06388.0f, -1.870714.0f, -0,3246065.0f, 0.0f } },

{ { 0.0f, 0,2652016.0f, -0,4539047.0f, 11,58219.0f }, { -0,2652016.0f, 0.0f, 11,58219.0f, 0,4539047.0f }, { 0,4539047.0f, -11,58219.0f, 0.0f, 0,2652016.0f }, { -11,58219.0f, -0,4539047.0f, -0,2652016.0f, 0.0f } },

{ { 0.0f, 0,1204867.0f, -4,630829.0f, 11,70588.0f }, { -0,1204867.0f, 0.0f, 11,70588.0f, 4,630829.0f }, { 4,630829.0f, -11,70588.0f, 0.0f, 0,1204867.0f }, { -11,70588.0f, -4,630829.0f, -0,1204867.0f, 0.0f } } } ;

AsiaMartini commented 3 years ago

Sorry, could you explain the error in that number? Is generated using your library, but when I print it in unity it looks different than in the console. I think the comma/dot is only a matter of visualization.

What do you means by six or seven numbers?

AsiaMartini commented 3 years ago

When I run the code in the .NET console, I usually get 7 decimals... when I run it in UWP... a lot more... Does this count??

An example: with same input, I got:

In the console: 2.0971791454745086

In UWP: 2,0971791454745086.0

KevinBaselinesw commented 3 years ago

I guess the problem is my compiler (set to US English) is confused by the mismatch of ',' and '.' characters.

It assumes that ',' characters separate numbers. The first line:

0.0f, 0,3246065.0f, -1,870714.0f, 11,06388.0f has 6 ',' characters so the compiler thinks that dimension of the array has 7 elements.

This line (#4) -11,06388.0f, -1.870714.0f, -0,3246065.0f, 0.0f has 5 ',' characters so the compiler throws an error because he only sees 6 elements.

I assume your visual studio is configured for Italian. Does Italian use ',' to separate numbers and also as part of the number?

AsiaMartini commented 3 years ago

Yes, I'll give a try to force the english separator (dot) in Unity and I'll let you know. But I can't understand where the problem is in the matmul... the separator only has value in a string... in the library do you cast the matrix as string to read the count of commas??

EDIT: When I a "normal" number I see only one separator. When I print an ndarray Unity print something like this: 0,3246065.0f With point AND comma. Do you think this can be a problem? Also the decimal count is different when I print the ndarray in the log, and I use same numbers, same print function, same everything. -.-'

KevinBaselinesw commented 3 years ago

0,3246065.0f

With point AND comma. Do you think this can be a problem? Yes. If we can fix that, I will try and reproduce the problem.

AsiaMartini commented 3 years ago

I'm doing some other tests... Seems like in Unity the sign minus before ZERO is removed from the ndarrays:

Console

q0_w_matrix :
DOUBLE
{ { { 0.0, 0.0, -0.0, 0.0 },
    { -0.0, 0.0, 0.0, 0.0 },
    { 0.0, -0.0, 0.0, 0.0 },
    { -0.0, -0.0, -0.0, 0.0 } },
  { { 0.0, 2.0, -2.0, 0.0 },
    { -2.0, 0.0, 0.0, 2.0 },
    { 2.0, -0.0, 0.0, 2.0 },
    { -0.0, -2.0, -2.0, 0.0 } },
  { { 0.0, 1.0, -3.0, 2.0 },
    { -1.0, 0.0, 2.0, 3.0 },
    { 3.0, -2.0, 0.0, 1.0 },
    { -2.0, -3.0, -1.0, 0.0 } } }

Hololens

    q0_w_matrix :
DOUBLE
{ { { 0.0, 0.0, 0.0, 0.0 },
    { 0.0, 0.0, 0.0, 0.0 },
    { 0.0, 0.0, 0.0, 0.0 },
    { 0.0, 0.0, 0.0, 0.0 } },
  { { 0.0, 2.0, -2.0, 0.0 },
    { -2.0, 0.0, 0.0, 2.0 },
    { 2.0, 0.0, 0.0, 2.0 },
    { 0.0, -2.0, -2.0, 0.0 } },
  { { 0.0, 1.0, -3.0, 2.0 },
    { -1.0, 0.0, 2.0, 3.0 },
    { 3.0, -2.0, 0.0, 1.0 },
    { -2.0, -3.0, -1.0, 0.0 } } }

I think this can be the problem... but how can I avoid it? Could this cause an exception in the matmul?

AsiaMartini commented 3 years ago

0,3246065.0f

With point AND comma. Do you think this can be a problem? Yes. If we can fix that, I will try and reproduce the problem.

Just to be clear, in the editor I write the numbers just like you, with point:

var a = 1.5;

I see the commas only when I log the values (implicit ToString).

But the only differences I see between the console and Unity (in which I use .NET too) are:

KevinBaselinesw commented 3 years ago

Here is the unit test that I am trying to reproduce the problem in.
I get a "shape mismatch" error instead of a null pointer dereference. Can you run this unit test and compare what we are putting into np.matmul with what your code is doing?
I must have something different than you.

    [TestMethod]
    public void test_matmul_asiamartini_bugreport2()
    {

        var m1data = new float[,,]

{ { { 0.0f, 0,3246065.0f, -1,870714.0f, 11,06388.0f }, { -0,3246065.0f, 0.0f, 11,06388.0f, 1,870714.0f }, { 1.0f, 870714.0f, -11, 06388.0f, 0.0f, 0, 3246065.0f }, {-11, 06388.0f, -1, 870714.0f, -0,3246065.0f, 0.0f } },

{ { 0.0f, 0,2652016.0f, -0,4539047.0f, 11,58219.0f }, { -0,2652016.0f, 0.0f, 11,58219.0f, 0,4539047.0f }, { 0,4539047.0f, -11,58219.0f, 0.0f, 0,2652016.0f }, { -11,58219.0f, -0,4539047.0f, -0,2652016.0f, 0.0f } },

{ { 0.0f, 0,1204867.0f, -4,630829.0f, 11,70588.0f }, { -0,1204867.0f, 0.0f, 11,70588.0f, 4,630829.0f }, { 4,630829.0f, -11,70588.0f, 0.0f, 0,1204867.0f }, { -11,70588.0f, -4,630829.0f, -0,1204867.0f, 0.0f } } };

        var m1 = np.array(m1data);

        var trans = np.transpose(m1, new long[] { 0, 2, 1 });
        print(trans);

        var m2 = np.array(new float[48]).reshape(3,4,4);
        print(m2);

        var temp1 = np.matmul(trans, m2);

      //  AssertArray(temp1, new double[] { 8.5, 42.5, -3.5, 12.5 });
       print(temp1);
    }
AsiaMartini commented 3 years ago

If I write 1,0004.0 the editor recognizes 2 different numbers;

So I don't think this is the problem... I'm trying to debug the code to see the variables value in runtime... I really can't understand... the error seems to indicate that the GetItem() function point to something that is null... inside the matrix...

AsiaMartini commented 3 years ago

I tried running the same project directly in Unity, and everything worked. The problem seems to be when running on Hololens (and Emulator, too). Behind it there is a C++ scripting backend, and I know that it cannot support any kind of reflection, cause it's an AOT platform.

Do you know if there is something in the library that could conflict with this type of platform, like runtime code generation or JIT compilation?

I'll close the issue, cause seems to be not related to the library. Many thanks for the help.

KevinBaselinesw commented 3 years ago

There are a few cases where I use the keyword "dynamic" which I think delays compilation of code until run time because I don't know the data type yet. That could fall under the JIT compilation.

AsiaMartini commented 3 years ago

I think this is actually the problem... dynamic keyword is not supported. Do you think I could manage this thing in some way? Maybe downloading the source code and editing it? Or did you used it because it's absolutely mandatory?

AsiaMartini commented 3 years ago

I found it in matmul... Mystery solved!

image

image

KevinBaselinesw commented 3 years ago

I don't think that is the dynamic keyword that is causing this particular problem. There is also dynamic used in ConvertSingleIndex which the stack trace shows as the root cause at the very beginning of this issue.

Do you have an email address? I can make a quick change and send you a DLL to test with before I check the change in and make it an official release.

contact me at kmckenna at baselinesw.com if you want a test DLL.