Open mhoward2718 opened 4 years ago
This paper describes a distributed PCA algorithm containing two main steps. Maybe a similar technique could be adopted for distributed, incremental KPCA. In the first step each node locally computes the SVD of its data partition, which I think could easily be replaced by your incremental method.
http://www.cs.cmu.edu/~ninamf/papers/distributedPCAandCoresets.pdf
That's an interesting idea! Sorry for the delay in getting back.
I did not think so much about parallelization yet. Would be great to use parallel computation and speed things up. Yup if there is a parallel algorithm for PCA or the eigendecomposition it should be straight-forward to do a parallel incremental algorithm! The matrix multiplications could possibly also be parallelized.
Hello. Thanks for your interesting algorithm and repository.
A nice advantage of many incremental methods is that they are also easily parallelized. Do you think your algorithm could be extended to a version that:
From a implementation perspective a package like Ray or Dask would probably be easy to use
From a theory perspective we only have to figure out how to combine the results of the separate processes, and how frequently, right? Or is it not that simple?