gdkrmr / dimRed

A Framework for Dimensionality Reduction in R
https://www.guido-kraemer.com/software/dimred/
GNU General Public License v3.0
73 stars 15 forks source link

Huge Sparse Matrix #42

Closed ran88dom99 closed 5 years ago

ran88dom99 commented 5 years ago

I have a huge data set with 98% of data missing. I use Sparse matrix and dataset fits easily into memory. As a full data frame it would use 100s of GB. Could you please let embed use Spare Matrix?

gdkrmr commented 5 years ago

Most of the methods cannot use sparse matrices and will convert this to a dense matrix anyway. Which method are you interested in?

ran88dom99 commented 5 years ago

ICA for sure but I like to let the computer try everything. How about "online batch" algorithms. "onlinePCA" https://cran.r-project.org/web/packages/onlinePCA/onlinePCA.pdf exists.

gdkrmr commented 5 years ago

Online fitting is possible with the autoencoders in dimRed, it would probably be possible for a handful of other methods. I currently have no overview which methods would support this and not enough time, if you want to make PRs you are welcome.

Online fitting is not a trivial task and it is easy for floating point inaccuracies to change your results significantly, even with the current autoencoders in dimRed, you have to take care of batching the data yourself correctly, so there is no automatism possible.

ran88dom99 commented 5 years ago

Thank you. Unfortunately another project has priority over my time for the next many months.