emmalanguage / emma

A quotation-based Scala DSL for scalable data analysis.
http://emma-language.org
Apache License 2.0
63 stars 19 forks source link

User facing linear algebra abstraction #187

Open akunft opened 8 years ago

akunft commented 8 years ago

This issue should be used to discuss the user facing abstraction for the matrix and vector type.

The initial prototype and ongoing effort is tracked in PR #191.

The focus is on the traits Matrix and Vector.

In the following, I want to highlight some of the more special parts of the abstraction open for discussion:

Type bounds for the generic value

Currently, we use spire.Numeric as type bound for the values in Matrix/Vector. This allows us to use all basic operations like +, -, * and / (which is not supported by the scala Numeric) and there are implicit conversions for all the numeric primitives in scala.

Instead, we could define our own type as bound for the values, to support e.g. Strings, boolean , ... similar to spire.Field. As this would generalize the abstraction, we also had to implement the implicit conversions spire gives us for free.

I would suggest to keep the Numeric bound for now and see if there is need for a wider bound.

Aggregations

Currently we allow aggregations over vectors only. This enables the user to define his own aggregation functions. In combination with the columns() and rows() method, the user can define aggregations of the columns and rows of a matrix.

A point to mention is that the return type is dependent on the result of the traversal.

    // means should be a row-vector
    val means = for (col <- M.cols()) yield {
      col.aggregate(_ + _) / col.length
    }
    // means should be a matrix
    val means = for (col <- M.cols()) yield {
      col + 3
    }

Open questions:

The current implementations are based on one dimensional arrays. Therefore it is easy to hand the execution of operators to netlib-java easily (not yet done).

aalexandrov commented 8 years ago
fschueler commented 8 years ago

I am still not sure on the Numeric type bound for matrices but I think for a start this should be fine.

Do we also allow aggregations over Vectors? I think this is important, e.g. for Vector-norms.

:+1: for aggregation over all elements. Most APIs offer aggregations specified by the dimensions (1 = rows, 2 = columns, default = all elements)

I would also not fix the traversal order and allow only commutative aggregation operations.

For the rest I will think about it some more!

akunft commented 8 years ago

Yes, we do allow aggregations over vectors, as shown in the example above. I also agree on the aggregation over elements, but I would keep the methods separated in cols(), rows() to do per-vector aggregations and an additional elements() method to do aggregation over the elements of a matrix.

akunft commented 8 years ago

Changes for the API are now tracked in PR #191

akunft commented 8 years ago

@stratosphere/emma-committers If nobody has objections, I would allow +,-,*,/ (single char only) as method names in the scala style formatter, for the methods in matrix/vector.

aalexandrov commented 8 years ago

:+1: