Parallel Computation of Similarity Matrices

NicolasHug / Surprise

A Python scikit for building and analyzing recommender systems

http://surpriselib.com

BSD 3-Clause "New" or "Revised" License

6.27k stars 1k forks source link

Parallel Computation of Similarity Matrices #169

Open gautamramk opened 6 years ago

gautamramk commented 6 years ago

Hi, I was wondering if it would be feasible to make the computation of similarity matrices run in parallel. This would help speed up the process, utilizing multiple cores for computation.

Reference link for Parallel Programming with CPython: http://cython.readthedocs.io/en/latest/src/userguide/parallelism.html

NicolasHug commented 6 years ago

I guess it would be fairly easy to do with joblib yes, especially since all the similarity metrics are computed with some sort of map / reduce process.

I assume though that you're asking because Spearman computation takes a lot of time (#168 )? I implemented a non-optimized version of Spearman's tau a while ago, and I remember it taking forever to compute. There are probably ways to optimize it (besides parallel computin) but I'm not familiar at all with the details.

gautamramk commented 6 years ago

I didn't ask this for the Spearman computation. Was just asking in general. Would be a very nice feature to have.

NicolasHug commented 6 years ago

Sure, I agree!

gautamramk commented 6 years ago

Can I take this up in my upcoming vacation?

NicolasHug commented 6 years ago

Absolutely

DibyamAgrawal commented 6 years ago

Hi gautamramk, I was wondering if you are working on this issue. If not I can take this up.

gautamramk commented 6 years ago

I shall do it, I am getting back to this issue