Closed ran88dom99 closed 5 years ago
Most of the methods cannot use sparse matrices and will convert this to a dense matrix anyway. Which method are you interested in?
ICA for sure but I like to let the computer try everything. How about "online batch" algorithms. "onlinePCA" https://cran.r-project.org/web/packages/onlinePCA/onlinePCA.pdf exists.
Online fitting is possible with the autoencoders in dimRed
, it would probably be possible for a handful of other methods. I currently have no overview which methods would support this and not enough time, if you want to make PRs you are welcome.
Online fitting is not a trivial task and it is easy for floating point inaccuracies to change your results significantly, even with the current autoencoders in dimRed
, you have to take care of batching the data yourself correctly, so there is no automatism possible.
Thank you. Unfortunately another project has priority over my time for the next many months.
I have a huge data set with 98% of data missing. I use Sparse matrix and dataset fits easily into memory. As a full data frame it would use 100s of GB. Could you please let embed use Spare Matrix?