Replace MASS v2 with MASS v3

barrybecker4 commented 5 years ago

@ensozos, From you comment in #11, we should speed up the execution of MPdistance by replacing MASS v2 with MASS v3 algorithm. Do you have a reference for it?

ensozos commented 5 years ago

Here you can find more about MASS_V3 FastestSimilaritySearch

barrybecker4 commented 5 years ago

Thanks. I just read the version 3 Matlab implementation. In version 3, the series is broken up and processed in chunks of length k. Is the idea that each of these chunks can be processed in a different thread? I am trying to create a Spark implementation, and the challenge that I face is that the advantage of Spark multi-node processing will be lost if I cannot find a way to partition the data. Do you know of any Spark implementations that are out there? I'm not even sure how Mass v3 can be run on a GPU. Does v3 run faster thatn v2 in Matlab? How much faster?

ensozos commented 5 years ago

Is the idea that each of these chunks can be processed in a different thread ?

I am not sure about that. I know that STOMP algorithm is parallelizable, so if you want to speed up the process and keep the advantages of Spark multi-node processing you may have to start from there.

Does v3 run faster than v2 in Matlab?How much faster?

Here is a graph (from the original paper and presentation) that answers your question.

mass

Do you know of any Spark implementation that are out there?

I think that there is no Spark implementation of Matrix Profile or MASS algorithms. According to this presentation distributed Matrix Profile for horizontal scalability is open problem. I am going to test how to implement Matrix Profile in spark-scala environment so i will let you know if I find anything interesting.

ensozos / Matrix-Profile

Replace MASS v2 with MASS v3 #12