antonioACR1 / Vector-Autoregression-1v1-in-Scala-

1 stars 0 forks source link

is the model distributed ? #1

Closed yassinecharjami closed 6 years ago

antonioACR1 commented 6 years ago

No, it's not distributed per-se. The reason is that I'm using this algorithm inside another model which is already distributed. So if I try to use RDD's inside this implementation then I will have issues with nested RDD's once I use it inside my second distributed model. In my case, this model works perfect because I need to compute thousands of Vector Autoregressions but all of them on small data, so each single Vector Autoregression is not expensive, the expensive part comes when I apply many of them.

Notice also that the official implementation of Univariate Vector Autoregression https://github.com/sryza/spark-timeseries/blob/master/src/main/scala/com/cloudera/sparkts/models/Autoregression.scala also makes use of local vectors and math3.stat.regression.

There are a few options however to make this distributed, for example you can broadcast the dense matrix from which I apply multiple linear regressions inside the methods lagSelection and fit, and also you could partition the indexes on the loops and include a few maps instead.

I can try to make those changes in order to make this distributed, just point me out some data on which you see performance issues with this implementation.