ginkgo-project / ginkgo

Numerical linear algebra software package
https://ginkgo-project.github.io/
BSD 3-Clause "New" or "Revised" License
399 stars 88 forks source link

Clarify the behavioral differences between a dense matrix and a multivector #796

Open MarcelKoch opened 3 years ago

MarcelKoch commented 3 years ago

The current Dense matrix can be used simultaneously as a regular dense matrix and as multiple column vectors combined (multivector/MV). In the class documentation, the case of multiple column vectors is even highlighted as a common use case. For most methods, there is no difference in the behavior of a dense matrix and a multivector. However, I think for the following methods some differences exists (impl marks the currently implemented version):

method Dense behavior MV behavior
apply matrix * vector product (impl) -
scale&add_scaled Apply the scaling row wise (corresponds to diag(alpha)*A, could also support alpha as column vector) Apply the scaling column-wise (impl)
dot Compute sum_ij a_ij * b_ij, i.e. the dot product of the flat vectors Compute a column-wise dot product (impl)
norm Compute the appropriate (frobenius, l1, l_inf, ...) matrix norm Compute a column-wise norm (impl)

Sidenote: The documentation of each method already specifies which behavior is used, but I don't think that is sufficient. I guess most users won't consult the documentation for these methods, which could cause some confusion if the dense behavior is assumed.

IMO, as the name already suggest the Dense class should model a dense matrix and the methods should have the expected behavior. If the behavior of a MV is wanted, e.g. for computing the norm of multiple vectors simultaneously, the name of the method should reflect that behavior. Since this change would break interface, it needs to be put on hold until ginkgo 2.0. Nevertheless, I propose the foilowing changes:

  1. add new methods with more descriptive names
  2. add a deprecation warning to the current methods stating that their behavior will change with the next major release and that the methods from 1. should be used.
  3. for the 2.0 release, the behavior of the current methods is changed to the one corresponding to a dense matrix.
upsj commented 3 years ago

Stealing a bit of MATLAB's notation, we could also add a (defaulted) parameter to compute_norm2 that for a nxm matrix specifies whether we want column norms (output 1xm), row norms (output nx1) or frobenius norm (output 1x1). With the Diagonal matrix, we already have a representation of row and column scaling we could use there. Do you have any use cases in mind for dot products of flattened matrices?

yhmtsai commented 3 years ago

compute_norm2 can relay on the output size, so we do not need to introduce another parameter

MarcelKoch commented 3 years ago

I'm open to having such a parameter, although I'd personally prefer different function names. I would also change the signature (in the dense case) to

ValueType compute_norm2()

as I think that's more natural.

The dot product for matrices appears in some finite element computations, I lastly remember it from some elasticity problem. It should be noted that in these cases the matrices are quite small, n~10. Also, this operation has a proper mathematical name (which I can't remember right now) and is denoted with A:B.

yhmtsai commented 3 years ago

I think add_scale and scale should be consider A x diag(alpha)? the diag(alpha) A is represented by Diagonal as @upsj said. dot should be the same behavior as MV behavior if we consider dot<A, B> = diag(A^T B) no matter it is MV or Dense. the operation you mention is like dot_multiple (.)

upsj commented 3 years ago

compute_norm2 can relay on the output size, so we do not need to introduce another parameter

Hm... while that is true, I am not sure whether that makes for readable/understandable code. For example

A->compute_norm2(result);
x->add_scaled(result, y);

What does this do? Does it scale all values of y by the same value, or does it scale individual columns by different values? You can not tell from this piece of code alone, which might be an issue. Of course the same is true for add_scaled, but at least that does not propagate.

I guess we could add a zero-parameter overload for compute_norm2, which would then be unambiguous, but since all other functions write into Dense, that would be a bit of a break with our current patterns. I would only add it as an alternative, not as the only option.

How about

A->compute_norm2(output, reduction::rowwise);
A->compute_dot(output, reduction::colwise);
A->compute_norm_inf(output, reduction::full); // or all or complete or whatever we like best
MarcelKoch commented 3 years ago

I'm not a big fan of keeping the current dot behavior as the default. The result of the dot operation, which is also called scalar product, should be a scalar.

BTW, the A:B operation I mentioned is called Frobenius inner product and can also be written as A:B = trace(B^T A) similar to what @yhmtsai mentioned, except with an additional reduction at the end.

MarcelKoch commented 3 years ago

For scale, add_scaled the scaling A * diag(alpha) doesn't make sense for me. Both operations are basically BLAS operations. In the scalar case, scale is defined as y <- alpha * y and add_scaled as y <- alpha * x + y. As these operations are not standardized for non-scalar alpha, I would prefer sticking to the ordering of the scalar operations.

upsj commented 3 years ago

All of these operations are heavily motivated by the solver design, and most of the uses of Dense right now are solver-related as multivectors, so I think we need to keep those in mind -- column-wise operations are most common in the solver setting. Related to the concept of dot products, for multivectors you would probably expect something like x'*y, of which we only compute the diagonal entries, since we don't have interaction between different columns in a solver run. I do agree though that we need a better balance between correct naming and practical considerations.

yhmtsai commented 3 years ago

BLAS operation provides the scalar version, but in the scaler sense, the operation is commutative. and it is different from apply if we A->apply(b, x) : A b -> x the be is on the right. but it is same in the adcvance_apply alpha A * x + beta y definitely, we need to provide different behavior for them

MarcelKoch commented 3 years ago

As I'm also in favor of keeping the current behavior, this boils down to the question, what should be the default behavior?


Pros and Cons of the default behaviors

Keeping Multivector

Pros:

Cons:

Changing to Dense

Pros:

Cons


Feel free to add, change pros and cons.

pratikvn commented 3 years ago

I agree that there is a semantic difference between Dense and Multivector. Currently while we use these interchangeably, we definitely should not do that and there are implications for example, in a distributed setting. I wanted to add on some points which we should think about before adding the functionalities.

  1. Storage layouts. How would you store a Dense matrix and how would you store a Multivector ? Both row-major, or one of them in column major ? I think we should take this opportunity to allow for different storage layouts in both.
  2. Creating sub-matrices. From a semantic perspective, does it makes sense to create a sub-matrix of a multi-vector ?
  3. In some cases, I think we need to have capabilities of both Dense and Multivector. For example, the hessenberg matrix is a combination of multiple Dense matrices (one for each right hand side) and we need to perform matrix operations with it (eg. triangular solves).
Slaedr commented 2 years ago

Points considered for the meeting on multivector/dense:

Outcomes: