Clarify the behavioral differences between a dense matrix and a multivector

MarcelKoch commented 3 years ago

The current Dense matrix can be used simultaneously as a regular dense matrix and as multiple column vectors combined (multivector/MV). In the class documentation, the case of multiple column vectors is even highlighted as a common use case. For most methods, there is no difference in the behavior of a dense matrix and a multivector. However, I think for the following methods some differences exists (impl marks the currently implemented version):

method	Dense behavior	MV behavior
`apply`	matrix * vector product (impl)	-
`scale&add_scaled`	Apply the scaling row wise (corresponds to `diag(alpha)*A`, could also support `alpha` as column vector)	Apply the scaling column-wise (impl)
`dot`	Compute `sum_ij a_ij * b_ij`, i.e. the dot product of the flat vectors	Compute a column-wise dot product (impl)
`norm`	Compute the appropriate (frobenius, l1, l_inf, ...) matrix norm	Compute a column-wise norm (impl)

Sidenote: The documentation of each method already specifies which behavior is used, but I don't think that is sufficient. I guess most users won't consult the documentation for these methods, which could cause some confusion if the dense behavior is assumed.

IMO, as the name already suggest the Dense class should model a dense matrix and the methods should have the expected behavior. If the behavior of a MV is wanted, e.g. for computing the norm of multiple vectors simultaneously, the name of the method should reflect that behavior. Since this change would break interface, it needs to be put on hold until ginkgo 2.0. Nevertheless, I propose the foilowing changes:

add new methods with more descriptive names
add a deprecation warning to the current methods stating that their behavior will change with the next major release and that the methods from 1. should be used.
for the 2.0 release, the behavior of the current methods is changed to the one corresponding to a dense matrix.

upsj commented 3 years ago

Stealing a bit of MATLAB's notation, we could also add a (defaulted) parameter to compute_norm2 that for a nxm matrix specifies whether we want column norms (output 1xm), row norms (output nx1) or frobenius norm (output 1x1). With the Diagonal matrix, we already have a representation of row and column scaling we could use there. Do you have any use cases in mind for dot products of flattened matrices?

yhmtsai commented 3 years ago

compute_norm2 can relay on the output size, so we do not need to introduce another parameter

MarcelKoch commented 3 years ago

I'm open to having such a parameter, although I'd personally prefer different function names. I would also change the signature (in the dense case) to

ValueType compute_norm2()

as I think that's more natural.

The dot product for matrices appears in some finite element computations, I lastly remember it from some elasticity problem. It should be noted that in these cases the matrices are quite small, n~10. Also, this operation has a proper mathematical name (which I can't remember right now) and is denoted with A:B.

yhmtsai commented 3 years ago

I think add_scale and scale should be consider A x diag(alpha)? the diag(alpha) A is represented by Diagonal as @upsj said. dot should be the same behavior as MV behavior if we consider dot<A, B> = diag(A^T B) no matter it is MV or Dense. the operation you mention is like dot_multiple (.)

upsj commented 3 years ago

compute_norm2 can relay on the output size, so we do not need to introduce another parameter

Hm... while that is true, I am not sure whether that makes for readable/understandable code. For example

A->compute_norm2(result);
x->add_scaled(result, y);

What does this do? Does it scale all values of y by the same value, or does it scale individual columns by different values? You can not tell from this piece of code alone, which might be an issue. Of course the same is true for add_scaled, but at least that does not propagate.

I guess we could add a zero-parameter overload for compute_norm2, which would then be unambiguous, but since all other functions write into Dense, that would be a bit of a break with our current patterns. I would only add it as an alternative, not as the only option.

How about

A->compute_norm2(output, reduction::rowwise);
A->compute_dot(output, reduction::colwise);
A->compute_norm_inf(output, reduction::full); // or all or complete or whatever we like best

MarcelKoch commented 3 years ago

I'm not a big fan of keeping the current dot behavior as the default. The result of the dot operation, which is also called scalar product, should be a scalar.

BTW, the A:B operation I mentioned is called Frobenius inner product and can also be written as A:B = trace(B^T A) similar to what @yhmtsai mentioned, except with an additional reduction at the end.

MarcelKoch commented 3 years ago

For scale, add_scaled the scaling A * diag(alpha) doesn't make sense for me. Both operations are basically BLAS operations. In the scalar case, scale is defined as y <- alpha * y and add_scaled as y <- alpha * x + y. As these operations are not standardized for non-scalar alpha, I would prefer sticking to the ordering of the scalar operations.

upsj commented 3 years ago

All of these operations are heavily motivated by the solver design, and most of the uses of Dense right now are solver-related as multivectors, so I think we need to keep those in mind -- column-wise operations are most common in the solver setting. Related to the concept of dot products, for multivectors you would probably expect something like x'*y, of which we only compute the diagonal entries, since we don't have interaction between different columns in a solver run. I do agree though that we need a better balance between correct naming and practical considerations.

yhmtsai commented 3 years ago

BLAS operation provides the scalar version, but in the scaler sense, the operation is commutative. and it is different from apply if we A->apply(b, x) : A b -> x the be is on the right. but it is same in the adcvance_apply alpha A * x + beta y definitely, we need to provide different behavior for them

MarcelKoch commented 3 years ago

As I'm also in favor of keeping the current behavior, this boils down to the question, what should be the default behavior?

Pros and Cons of the default behaviors

Keeping Multivector

Pros:

no changes necessary
matches the needs of the solvers

Cons:

class is called Dense not multivector
necessary to read documentation for simple functions

Changing to Dense

Pros:

matches the 'standard' linear algebra meaning
class name suggest behavior of dense matrix

Cons

need to add new implementations
the currently common use cases need to be adjusted (calling another method or passing an additional parameter)
breaks interface (-> can only be fully implemented in next major release)

Feel free to add, change pros and cons.

pratikvn commented 3 years ago

I agree that there is a semantic difference between Dense and Multivector. Currently while we use these interchangeably, we definitely should not do that and there are implications for example, in a distributed setting. I wanted to add on some points which we should think about before adding the functionalities.

Storage layouts. How would you store a Dense matrix and how would you store a Multivector ? Both row-major, or one of them in column major ? I think we should take this opportunity to allow for different storage layouts in both.
Creating sub-matrices. From a semantic perspective, does it makes sense to create a sub-matrix of a multi-vector ?
In some cases, I think we need to have capabilities of both Dense and Multivector. For example, the hessenberg matrix is a combination of multiple Dense matrices (one for each right hand side) and we need to perform matrix operations with it (eg. triangular solves).

Slaedr commented 2 years ago

Points considered for the meeting on multivector/dense:

Conceptually it makes sense to separate multivector and dense, eg.: (i) column-wise dot products don't make sense for Dense matrices in general, only for multivectors, (ii) transpose does not make much sense for multivectors, only for "general" Dense matrices.
For distributed functionality, it is technically untenable to support "dense matrix" operations (such as matrix multiply and transpose) on multivectors. Otherwise, these operations on multivectors imply we need separate partition schemes and internal representations for the transposed "short fat" matrices.

Outcomes:

It makes sense to separate multivector and dense into separate class hierarchies. Multivector will not be a LinOp. LinOp's apply will take multivectors as arguments (and perhaps a separate apply that takes other LinOps, which is somewhat a separate discussion). This also means almost everywhere we use Dense will need to be replaced by MultiVector, a heavy interface break.
A transition to multivectors will need to happen towards the end of a release cycle, where we provide both interfaces LinOp LinOp::apply(const LinOp, LinOp) as well as LinOp LinOp::apply(const MultiVector, MultiVector). We mark the former as deprecated (assuming they will not be supported in the future, depending the advanced apply issue).
View mechanics could be implemented to enable "viewing a multivector as a Dense matrix" and vice-versa. This could also help with the transition.

ginkgo-project / ginkgo