fre-hu / mdarray

Multidimensional array for Rust
Apache License 2.0
11 stars 0 forks source link

Support for row- and column-major order #3

Open grothesque opened 1 month ago

grothesque commented 1 month ago

I reference a discussion from https://github.com/rust-ndarray/ndarray/issues/1272#issuecomment-2325747608:

@grothesque wrote:

Is your choice motivated by BLAS/LAPACK being (marginally) more efficient for column-major data?

Do I understand correctly that mdarray is column major in the sense that the restricted layouts are column major? But the fully strided layout can accept any (fixed rank) strided array, right? Right now in Rust we cannot have a fully generic ndspan like in C++, but it should be possible to have a set of useful layouts for both column-major and row-major within a single library, or do you see a problem with this?

@fre-hu replied:

The choice is only to have a convention, and then column major is common for linear algebra. It is used both for memory layout and to give the order of dimensions in iteration.

Using strided layout with row major data will work, but operations that depend on iteration order will have worse access pattern. It works fine for interfacing though, and internally one could make a copy or reverse indices.

To have full support for both row and column major would require one more generic parameter for the order. I had it in an earlier version, both removed it as it made both the library and interface more complex. C++ mdspan gets around this since it is quite thin.

From my point of view, row-major is arguably more relevant for a Rust array library than column major:

So, if there can be only one, I'd vote for row major :innocent:...

However, as far as memory layout goes, it should be possible to have both without an additional generic parameter, right? Just like C++ mdspan has layout_right, layout_left, and layout_stride.

The problem seems to be more about ensuring efficient order of dimensions when iterating. One possibility would be to have both ("iterate_left_to_right", and "iterate_right_to_left"), and then only one (or none) would be efficient for a given array.

To treat the general case efficiently, there could be a function to (statically or dynamically) reorder dimensions into either layout (if possible).

All of this would not require an additional generic parameter (I believe).

fre-hu commented 1 month ago

I have switched to row-major order as it makes more sense, see the latest commit. I have also reduced the layout types to dense and strided, since these are the most important. Better to keep it simple and add back if it is really needed.

Regarding having both row and column-major order, yes the order could be merged into the layout type. The complexity is still there though, and there must be rules about iteration order, how to derive types for subarrays and how to do broadcasting etc. I will think some more about it.

grothesque commented 1 month ago

Sounds great. Looking forward to your ideas about merging order into layout.

I fully agree that an array crate for Rust should strive to be minimalist. I think that the right approach is to think of it as providing common concepts and glue, while actual numerical algorithms are to be provided by other crates.

This is the approach that has been successful with Fortran, and that C++ is finally pursuing with their mdspan. I consider the monolithic approach of Numpy/SciPy, despite their success, as a technical liability rooted in the shortcomings of Python's packaging.