fredhallgren / inkpca

Incremental kernel PCA
Apache License 2.0
28 stars 8 forks source link

Distributed version of algorithm #6

Open mhoward2718 opened 4 years ago

mhoward2718 commented 4 years ago

Hello. Thanks for your interesting algorithm and repository.

A nice advantage of many incremental methods is that they are also easily parallelized. Do you think your algorithm could be extended to a version that:

From a implementation perspective a package like Ray or Dask would probably be easy to use

From a theory perspective we only have to figure out how to combine the results of the separate processes, and how frequently, right? Or is it not that simple?

mhoward2718 commented 4 years ago

This paper describes a distributed PCA algorithm containing two main steps. Maybe a similar technique could be adopted for distributed, incremental KPCA. In the first step each node locally computes the SVD of its data partition, which I think could easily be replaced by your incremental method.

http://www.cs.cmu.edu/~ninamf/papers/distributedPCAandCoresets.pdf

fredhallgren commented 4 years ago

That's an interesting idea! Sorry for the delay in getting back.

I did not think so much about parallelization yet. Would be great to use parallel computation and speed things up. Yup if there is a parallel algorithm for PCA or the eigendecomposition it should be straight-forward to do a parallel incremental algorithm! The matrix multiplications could possibly also be parallelized.