emmalanguage / emma

A quotation-based Scala DSL for scalable data analysis.
http://emma-language.org
Apache License 2.0
63 stars 19 forks source link

Linear algebra library #138

Open aalexandrov opened 8 years ago

aalexandrov commented 8 years ago

The examples folder has several algorithms that mix the DataBag abstraction with linear algebra. Currently, we use Breeze, but we might switch to something else if we agree that it is better.

Let's try to make a summary of the different pros and cons of the various options here:

Breeze

joroKr21 commented 8 years ago

The netlib-java home page says that Breeze is built on top of it. Basically Breeze is a higher-level Scala wrapper. So I would vote against netlib-java.

Then we have on the one hand Breeze, which is used in MLlib for Spark, and on the other hand Spire, which is somewhat similar to Twitter's Algebird, which they say can be used on top of Scalding or Storm.

What we need to ask ourselves is if we want to completely translate the linear algebra API to DataBag comprehensions (slower, but all types can be supported) or just chunk the matrices/vectors into blocks that are forwarded locally to native libraries (faster, but only numerics can be supported). Ideally we would be able to handle (products of) numerics natively and fallback to the JVM for more complex data (with some warnings ofc).

fschueler commented 8 years ago

Yes, netlib-java is very low-level and used by breeze.

I think numerics cover a big part of all usecases and I would vote for speed in their case. Ideally even for local execution (through breeze).

Nonetheless I like the approach by scalding/algebird very much. Allowing linear algebra operations on for example vectors of bloom filters sounds really cool.

joroKr21 commented 8 years ago

Oh God, I just realized that Breeze doesn't have the outer vector product :facepalm:

aalexandrov commented 8 years ago

But you use this in ALS, don't you?

joroKr21 commented 8 years ago

No, I use Breeze only to invert the matrix. There's a ticket for the outer product on GitHub.

aalexandrov commented 8 years ago

I'm leaving @akunft in charge of this.

aalexandrov commented 8 years ago

I think we can close this, as the discussion has moved to #187. @stratosphere/emma-committers Does anybody object?

fschueler commented 8 years ago

:+1:

akunft commented 8 years ago

New meta-issue in #188.