Mojo-Numerics-and-Algorithms-group / NuMojo

NuMojo is a library for numerical computing in Mojo 🔥 similar to numpy in Python.
Apache License 2.0
69 stars 13 forks source link

Make dimension of an ndarray a parameter and known at compile time #58

Open forFudan opened 1 month ago

forFudan commented 1 month ago

Current situation

Currently, the dimension of the ndarray is inferred from the shape at the run time. This would lead to extra cost.

Proposal

As the dimension is known at initialisation, it might be nice to make ndim a parameter and known at compile time. Example:

var NDArray[2, DType.float64](1000, 1000, random=True)  # Creating a 1000x1000 2-D array (matrix)

Additional benefits

Making dimension a parameter at compile time may have additional benefits. For example, we can define alias to NDArrays more easily.

alias Vector = NDArray[1, _]
alias Matrix = NDArray[2, _]
MadAlex1997 commented 1 month ago

That would make NDArray[1,_] and NDArray[2,_] not the same type and add a requirement for casting rules between differently ranked arrays. A good example of this is that a linear-arranged array from 0 to 100 (100) would be a different type than after that array has been reshaped into a square (10x10). Also, there are a good number of operations that need to be able to take arguments of different ranks and then return an array that is not the same rank as the first two.

We want NDArray to be as easy to use as is reasonable to still get the benefits from Mojo. I feel that the better optimization would be to build a Statically Dimensioned/Shaped array separate from NDArray, and allow NDArray to be more dynamic. A statically dimensioned/shaped array will require more skill and math knowledge to use correctly but it would do much of its work at compile time. Once Modular adds field declarations to traits we could have a trait that our array types conform to that would allow the parts of the library where the rank doesn't matter to work the same for both, and then write overloads where the difference would be meaningful (statically shaped array matmul would probably be way faster).

Either way, it feels like something we want pre-v1.0 but not necessarily something we want to do in the next few months.

forFudan commented 1 month ago

A good example of this is that a linear-arranged array from 0 to 100 (100) would be a different type than after that array has been reshaped into a square (10x10). Also, there are a good number of operations that need to be able to take arguments of different ranks and then return an array that is not the same rank as the first two.

I fully agree. This is a good example of the limitation of the ndim parameter.

I feel that the better optimization would be to build a Statically Dimensioned/Shaped array separate from NDArray, and allow NDArray to be more dynamic.

Yes, that seems to be a better way. We can even define a ArrayLike trait in future so that functions can be applied on NDArray, Vector, and Matrix.

shivasankarka commented 3 weeks ago

I agree that we should have NDArray and something like StaticNDArray should be separate to make NuMojo more accessible for anyone, this can also be a standout feature. Although it is not important for v0.2, We can start porting things slowly and release once we feel it's good enough. I will make an initial version and create a branch.