Reference-LAPACK / lapack

LAPACK development repository
Other
1.46k stars 430 forks source link

in-place (scaled) matrix transposition: imatcopy #1017

Open rileyjmurray opened 2 months ago

rileyjmurray commented 2 months ago

Intel MKL has a useful function for (scaled) in-place transposition:

https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-fortran/2024-1/mkl-imatcopy.html

I raised the issue of having this function as a utility in LAPACK proper, and Julian (@langou) expressed support for that. I'm making a note of it here so we don't lose track of this.

thijssteel commented 2 months ago

I had a look at in-place matrix transposes at some point, but they are far from trivial to implement.

If the matrix is square, it is just a question of swapping A(i,j) and A(j,i). A simple recursive implementation is even reasonably efficient.

If the matrix is not square, the cycle lengths can be longer, so we almost certainly require some memory (not just for an efficient implementation, but for a reference too).

ilayn commented 2 months ago

Indeed cache-oblivious transpose is really hard to get it right and quite challenging in terms of low-level decisions. But I would be really happy if we get it. I do think it won't be as performant as the copy-transpose but there is still a lot to reap compared to a double loop.

mgates3 commented 1 month ago

copy is an odd name and description for an in-place operation. From Intel's docs, "A transposition operation can be a normal matrix copy, a transposition, a conjugate transposition, or just a conjugation. The operation is defined as follows:

AB := alpha*op(AB)."

(emphasis mine)

Since the source and destination are both AB, I gather that "a normal matrix copy" (trans=N) is basically changing the stride in place, from lda to ldb? (And applying scaling.)

Gustavson has several papers on in-place transpose, e.g., https://doi.org/10.1145/2168773.2168775. Something akin to this in implemented in PLASMA.

rileyjmurray commented 1 month ago

@mgates3, while I agree that imatcopy is not a great name, it's a fairly common one:

In situations like these I think it's preferable to go with the crowd, and just have good documentation about what can be done with the function.

@thijssteel and @ilayn: it'd be fine with me to use some workspace. When I said in-place, I really just meant "don't force the user to make a full copy."

mgates3 commented 1 month ago

@rileyjmurray Respectfully, I disagree. If Reference BLAS / LAPACK is a (de facto) standard, it should be reasonably self-consistent. imatcopy and omatcopy don't follow the BLAS naming scheme in any way, though the arguments seem conforming. If we propose a standard name, it should be trivial for libraries that have imatcopy to add a new name.

This goes to my point on gemmt and batch gemm, that we as a community need a mechanism to propose and adopt new standard routines.

rileyjmurray commented 1 month ago

@mgates3, fair enough! I don't actually have a strong opinion on what the function should be called.