Closed grothesque closed 2 weeks ago
Thanks for the suggestion and comments! I have added design notes in the latest commit as a start. The layout types are omitted for now since I'm looking into if they can be improved.
Very useful, many thanks.
Here's one thing that is still unclear to me: Would it be possible to modify the example program such that a
or b
could be created using the array!
macro, i.e. with static size? Or in other words, is there some way to cast a static stack-allocated array into a DSpan
?
If this is not possible, how would one write a (generic?) function that accepts input created with either expr!
or array!
?
I think I found a solution myself. The following generic function accepts references to arrays as well:
pub fn matmul<SA, SB, SC>(a: &Span<f64, SA>, b: &Span<f64, SB>, c: &mut Span<f64, SC>)
where
SA: Shape,
SB: Shape,
SC: Shape,
{
for (mut cj, bj) in c.cols_mut().zip(b.cols()) {
for (ak, bkj) in a.cols().zip(bj) {
for (cij, aik) in cj.expr_mut().zip(ak) {
*cij = aik.mul_add(*bkj, *cij);
}
}
}
}
Here is my understanding why the original non-generic matmul
function did not accept references to stack-allocated arrays: these arrays have constant sizes, so that there is nowhere an appropriate RawSpan
that could be pointed to. Is this correct?
Still, is there a way to make an array!
-allocated array work with the original matmul
? That would probably require creating a view of it somehow (that would contain a RawSpan
instance).
Yes, right. One way is to do a.reshape([2, 2])
or so which will change the shape type to have dynamic dimensions.
One piece of information that I feel should be added somewhere is while there are both layouts and mappings. The way I understand it is that while types that implement the Layout
trait (e.g. Dense
) do not store any data, types that implement the Mapping
trait (e.g. DenseMapping
) do store data, i.e. the shape and the necessary strides.
So, if my understanding is correct, an Array
(stack-allocated, static-sized array) instance does not keep a Mapping
struct, while a Grid
or Expr
does.
Yes, right. One way is to do
a.reshape([2, 2])
or so which will change the shape type to have dynamic dimensions.
Thanks, this solution was not obvious to me. But it requires redundantly restating the shape, and panics if it doesn't match. The following seems to be more robust: a.reshape(a.shape().dims())
, however it doesn't work with reshape_mut
due to the borrow checker complaining.
Perhaps this should be a method, how about to_expr
?
Or perhaps there should be even an equivalent impl of Borrow
or AsRef
? (This might be a bad idea, I don't have much experience with this.) In this way, the call to matmul
would look the same no matter whether the arguments are borrowed as DSpan
or as Span
.
I added more conversions for Array to be consistent with normal arrays. It should work now with &Expr::from(&a)
. Let's see if it is sufficient.
Here is an improved generic version of matmul
that uses the type system to ensure that the static sizes (if used) do match:
pub fn matmul_generic<D0: Dim, D1: Dim, D2: Dim>(
a: &Span<f64, (D0, D1)>,
b: &Span<f64, (D1, D2)>,
c: &mut Span<f64, (D0, D2)>,
) {
for (mut cj, bj) in c.cols_mut().zip(b.cols()) {
for (ak, bkj) in a.cols().zip(bj) {
for (cij, aik) in cj.expr_mut().zip(ak) {
*cij = aik.mul_add(*bkj, *cij);
}
}
}
}
Is this use of mdarray "idiomatic"?
The above function has also the advantage of allowing the compiler to optimize the inner loop for a fixed number of iterations. For example, the following non-generic function
pub fn matmul_opt(a: &DSpan<f64, 2>, b: &DSpan<f64, 2>, c: &mut DSpan<f64, 2>) {
let a = a.reshape((Const::<4>, Dyn(2)));
let mut c = c.reshape_mut((Const::<4>, Dyn(2)));
matmul_generic(&a, &b, &mut c);
}
is compiled to assembly that uses the fact that the inner loop has exactly four iterations. (I verified looking at the assembly generated by cargo rustc --release -- -Ctarget-feature=+fma --emit asm
.) However, it only uses the FMA instruction vfmadd213sd
and not its packed variant vfmadd213pd
- I guess that's because the compiler cannot be sure about sufficient alignment. (I saw that there are ways to allocate Grid
arrays with a specific alignment, so there seem to be ways.)
Yes having Dim parameters per dimension is the way to do it.
I did some testing, and it seems that the missed optimization is due to limitations in alias analysis. I got it to generate packed FMAs by breaking out the inner loop in a separate function. Somehow the Span references must be visible in function arguments for it to optimize.
If I increase the column length to 64 or make it dynamic, the compiler generates different code paths and checks for aliasing at runtime with the original function.
pub fn mul_col<D: Dim>(cj: &mut Span<f64, D>, ak: &Span<f64, D>, bkj: &f64)
{
for (cij, aik) in cj.expr_mut().zip(ak) {
*cij = aik.mul_add(*bkj, *cij);
}
}
pub fn matmul_generic<D0: Dim, D1: Dim, D2: Dim>(
a: &Span<f64, (D0, D1)>,
b: &Span<f64, (D1, D2)>,
c: &mut Span<f64, (D0, D2)>,
) {
for (mut cj, bj) in c.cols_mut().zip(b.cols()) {
for (ak, bkj) in a.cols().zip(bj) {
mul_col(&mut cj, &ak, bkj);
}
}
}
pub fn matmul_opt(a: &DSpan<f64, 2>, b: &DSpan<f64, 2>, c: &mut DSpan<f64, 2>) {
let a = a.reshape((Const::<4>, Dyn(2)));
let mut c = c.reshape_mut((Const::<4>, Dyn(2)));
matmul_generic(&a, &b, &mut c);
}
Thanks for sharing this very interesting library! I've been trying to understand its inner workings, but this is difficult since some central aspect seem both very clever (in a positive sense), and are not documented at all. I think that adding a "DESIGN.md" file (even a small one) may greatly increase the usefulness of this project to other people (be it possible contributors or members of the wider numeric Rust community).
For example, would you mind to describe how
Span
works? I note that it is a zero-sized struct, and only&Span
are passed around. But what do these actually point to? And what is the role ofRawSpan
?Or let's take
Expr
/Expression
. I first thought that it's a view with a somewhat badly chosen name, but it seems to be much more than that. Are types that implement the traitExpression
"expression templates" in C++ speak?What is the relation of
Expr
andSpan
and why are both needed? You write that spans are like Rust slices, but slices are fat pointers, while spans seem to be just pointers that point to... something (that is owned by whom?).It would be also interesting to have a brief discussion of the design aspects of C++'s mdspan/mdarray that you took over, and those that you didn't/couldn't and the reasons for that.