In Julia 1.1.1, I have discovered a few problems with ldiv! and rdiv! when the arguments are triangular matrices. The problems are in both triangular.jl and bidiag.jl. The former source file does not cover all the permutations given by (left,right) x (upper,lower) x (plain, transpose, adjoint), while bidiag.jl fills in the holes with more general, but unnecessarily slow solutions. I will sketch some proposed improvements.
[edit: I have since discovered another problem (aliasing). There are also similar issues with lmul! and rmul!. I also have some new ideas how to fix these problems. Please see this discourse post.]
I discovered these problems when I coded a method:
try_ldiv!(L,R) = applicable(ldiv!,L,R) ? ldiv!(L,R) : L \ R
The idea was to improve on the efficiency of L\R for cases where inplace solutions are available. Unfortunately this does not always work out as intended, because whenever triangular arguments end up in the ldiv! methods implemented by bidiag.jl, then they are much slower than the non-inplace alternative L\R. Moreover, the bidiag.jl methods sometimes crash for cases where L\R would have worked. For these reasons, I would propose that the method signatures in bidiag.jl that allow triangular arguments be modified to exclude triangular arguments. (At a first glance, this seems do-able by removing AbstractTriangular from the type unions in the relevant method signatures.)
Instead, the holes in triangular.jl can be filled in with efficient BLAS-based solutions, while those cases that cannot be (efficiently) done inplace should be omitted entirely, so that applicable(ldiv!,L,R) returns false for such cases. An example isldiv!(L,U), where L and U are lower and upper triangular matrices. The result of L\U is not triangular, so that an inplace update is not possible, thereforebidiag.jl should not declare its ldiv! methods to include such cases.
First, some background:
The function ldiv!(L,R) has the same effect as L \ R, except that matrix or vector R is updated inplace. This inplace \ operation cannot be done for general L, but when L is a scalar, multiple of the identity matrix, diagonal, or triangular, etc., efficient inplace solutions are provided by BLAS and other algorithms elsewhere in Julia. The same applies to rdiv!(L,R), where L is updated inplace, for suitable types of R.
When L and R are both triangular and they agree in orientation (uplo), i.e. both are upper- or lower-triangular, then the solution, L \ R is also triangular, with the same uplo. Such updates can be implemented by calling almost directly into BLAS with minimal overhead and this is indeed exploited nicely in triangular.jl for some (but unfortunately not all) of the triangular type combinations.
The fast case is implemented by triangular.jl, using a single BLAS.trsm! call, while the slow one defaults to bidiagonal.jl, which ends up doing some to-and-fro copying and a separate trsm!call for every column.
Each such case that is not currently implemented by triangular.jl can be added with typically two-line methods.
While adding implementations for cases like ldiv!(L,L2) and rdiv!(U,U2), should be relatively straight-forward, an extra trick is required for cases where the to-be-updated numerator is a triangular matrix wrapped in Transpose or Adjoint. These cases are solved by noting that, for example (purely mathematically):
L \ R ' = (R / L' ) '
where the RHS numerator, R is now plain triangular and easy to update. I have tentatively started coding some methods to implement this trick here.
In Julia 1.1.1, I have discovered a few problems with
ldiv!
andrdiv!
when the arguments are triangular matrices. The problems are in bothtriangular.jl
andbidiag.jl
. The former source file does not cover all the permutations given by(left,right) x (upper,lower) x (plain, transpose, adjoint)
, whilebidiag.jl
fills in the holes with more general, but unnecessarily slow solutions. I will sketch some proposed improvements.[edit: I have since discovered another problem (aliasing). There are also similar issues with
lmul!
andrmul!
. I also have some new ideas how to fix these problems. Please see this discourse post.]I discovered these problems when I coded a method:
The idea was to improve on the efficiency of
L\R
for cases where inplace solutions are available. Unfortunately this does not always work out as intended, because whenever triangular arguments end up in theldiv!
methods implemented bybidiag.jl
, then they are much slower than the non-inplace alternativeL\R
. Moreover, thebidiag.jl
methods sometimes crash for cases whereL\R
would have worked. For these reasons, I would propose that the method signatures inbidiag.jl
that allow triangular arguments be modified to exclude triangular arguments. (At a first glance, this seems do-able by removingAbstractTriangular
from the type unions in the relevant method signatures.)Instead, the holes in
triangular.jl
can be filled in with efficient BLAS-based solutions, while those cases that cannot be (efficiently) done inplace should be omitted entirely, so thatapplicable(ldiv!,L,R)
returns false for such cases. An example isldiv!(L,U)
, whereL
andU
are lower and upper triangular matrices. The result ofL\U
is not triangular, so that an inplace update is not possible, thereforebidiag.jl
should not declare itsldiv!
methods to include such cases.First, some background:
ldiv!(L,R)
has the same effect asL \ R
, except that matrix or vectorR
is updated inplace. This inplace\
operation cannot be done for generalL
, but whenL
is a scalar, multiple of the identity matrix, diagonal, or triangular, etc., efficient inplace solutions are provided by BLAS and other algorithms elsewhere in Julia. The same applies tordiv!(L,R)
, whereL
is updated inplace, for suitable types ofR
.L
andR
are both triangular and they agree in orientation (uplo
), i.e. both are upper- or lower-triangular, then the solution,L \ R
is also triangular, with the sameuplo
. Such updates can be implemented by calling almost directly into BLAS with minimal overhead and this is indeed exploited nicely intriangular.jl
for some (but unfortunately not all) of the triangular type combinations.Let's try an example:
The fast case is implemented by
triangular.jl
, using a singleBLAS.trsm!
call, while the slow one defaults tobidiagonal.jl
, which ends up doing some to-and-fro copying and a separatetrsm!
call for every column.Each such case that is not currently implemented by
triangular.jl
can be added with typically two-line methods.While adding implementations for cases like
ldiv!(L,L2)
andrdiv!(U,U2)
, should be relatively straight-forward, an extra trick is required for cases where the to-be-updated numerator is a triangular matrix wrapped inTranspose
orAdjoint
. These cases are solved by noting that, for example (purely mathematically):where the RHS numerator,
R
is now plain triangular and easy to update. I have tentatively started coding some methods to implement this trick here.