Mojo-Numerics-and-Algorithms-group / NuMojo

NuMojo is a library for numerical computing in Mojo 🔥 similar to numpy in Python.
Apache License 2.0
86 stars 15 forks source link

Add "order" as an argument when creating an NDArray. #37

Closed forFudan closed 2 months ago

forFudan commented 2 months ago

Add "order" as an argument when creating an NDArray. It can be "C" (row-majored) or "F" (col-majored). The strides will be generated according to this argument.

MadAlex1997 commented 2 months ago

Have you tested things that rely on the pointer's load and store function to access specific locations sequentially? For example matmul.

As long as striding still works I think most of the other functions still work, as they either just go equal index to index or use striding for on-axis functions.

Another potential problem would come from trying to do ops between C and F order arrays. We either need an order a parameter in order to get the compiler to prevent this or we need to implement checks in each function. I think both represent a time commitment and change to the entire codebase.

I wonder if having the order parameter bound to a trait for which we define structs COrder and FOrder then define casting.

Overall I like it but I am not sure that this feature will be ready in the time before we release v0.1.

I am going to start a discussion about this topic.

shivasankarka commented 2 months ago

I think it's better to refrain from writing code within if else checks inside NDArray to switch between C and F order as it'll make our code difficult to maintain in long run once we start adding more features. It'll be nice if we could have struct inheritance in Mojo.

forFudan commented 2 months ago

Have you tested things that rely on the pointer's load and store function to access specific locations sequentially? For example matmul.

Yes, the results are the same when I use the naive matmul. Actually the current load and store function only process one item at once. So any index (x,y) of a matrix will be translated to a single index of the data buffer, by using the strides information.

fn load[width: Int = 1](self, *indices: Int) -> SIMD[dtype, width]:
        """
        Loads a SIMD element of size `width` at given variadic indices argument.
        """
        var index: Int = _get_index(indices, self.coefficients)
        return self.data.load[width=width](index)

As long as striding still works I think most of the other functions still work, as they either just go equal index to index or use striding for on-axis functions.

Yes, if the stride works, then order also works. Because order impact the strides.

Another potential problem would come from trying to do ops between C and F order arrays. We either need an order a parameter in order to get the compiler to prevent this or we need to implement checks in each function. I think both represent a time commitment and change to the entire codebase.

I do not think so. The order is related to strides. One the order is determined, e.g., C or F, the strides can be calculated immediately. Then the order is not needed anymore. The indices will be translated using only strides information.

I wonder if having the order parameter bound to a trait for which we define structs COrder and FOrder then define casting.

Same as above. I do not think they are needed as the the only impacted property is strides, which can be determined at the init of the ndarray.

Overall I like it but I am not sure that this feature will be ready in the time before we release v0.1.

I am going to start a discussion about this topic.

I agree. This is not very urgent.

forFudan commented 2 months ago

I think it's better to refrain from writing code within if else checks inside NDArray to switch between C and F order as it'll make our code difficult to maintain in long run once we start adding more features. It'll be nice if we could have struct inheritance in Mojo.

The "c" and 'f" argument are used at the initialisation of the array and translated into strides. It will not give too much burden. If there are two structs, then users cannot easily alter the order by just change the strides information.

shivasankarka commented 2 months ago

The "c" and 'f" argument are used at the initialisation of the array and translated into strides. It will not give too much burden. If there are two structs, then users cannot easily alter the order by just change the strides information.

Yes, you are right, I misinterpreted your & alex's discussions above.

In the numpy style arrays, the difference between row and column major is only stride information. We had already worked with column major array before we switched to row major (I only had to change strides calculation since it's like a transpose). The only function where there might be a change is the str() one. Have you tested the str() function?

Right now I have moved the shape and stride calculation to NDArrayShape & NDArrayStride's init method instead of cluttering NDArray init. I am also making the init method similar to numpy and cleaning it up. I will push the changes and let you know, so that we are working with same version. Please reopen this PR after that.