Open barrybecker4 opened 5 years ago
Here you can find more about MASS_V3 FastestSimilaritySearch
Thanks. I just read the version 3 Matlab implementation. In version 3, the series is broken up and processed in chunks of length k. Is the idea that each of these chunks can be processed in a different thread? I am trying to create a Spark implementation, and the challenge that I face is that the advantage of Spark multi-node processing will be lost if I cannot find a way to partition the data. Do you know of any Spark implementations that are out there? I'm not even sure how Mass v3 can be run on a GPU. Does v3 run faster thatn v2 in Matlab? How much faster?
Is the idea that each of these chunks can be processed in a different thread ?
I am not sure about that. I know that STOMP algorithm is parallelizable, so if you want to speed up the process and keep the advantages of Spark multi-node processing you may have to start from there.
Does v3 run faster than v2 in Matlab?How much faster?
Here is a graph (from the original paper and presentation) that answers your question.
Do you know of any Spark implementation that are out there?
I think that there is no Spark implementation of Matrix Profile or MASS algorithms. According to this presentation distributed Matrix Profile for horizontal scalability is open problem. I am going to test how to implement Matrix Profile in spark-scala environment so i will let you know if I find anything interesting.
@ensozos, From you comment in #11, we should speed up the execution of MPdistance by replacing MASS v2 with MASS v3 algorithm. Do you have a reference for it?