bnprks / BPCells

Scaling Single Cell Analysis to Millions of Cells
https://bnprks.github.io/BPCells
Other
166 stars 17 forks source link

[r] add lsi, var feature selection #156

Open immanuelazn opened 2 weeks ago

immanuelazn commented 2 weeks ago

Description

As discussed previously, we would like to have a built-in function for creating latent representations of input matrices. The most common way to do this is using LSI. We implemented functions lsi() and highly_variable_features() to implement this.

Tests

Details

We are not looking for a one to one implementation of what has been done on ArchR, which is an iterative LSI approach, with clustering, variable feature selection etc. Instead, we implement feature selection as a separate step, as well as LSI procedure using log(tf-idf) -> z-score norm -> SVD.

As for projection into LSI space, that will be built in a follow up PR for the sklearn interface

bnprks commented 2 weeks ago

Looks like a good start! A couple high-level thoughts/comments while this is still in the draft stage:

immanuelazn commented 2 weeks ago

I'm leaving the memory saving stuff to just be saving to a temporary file for now. Let's discuss further in person, then see the optimal way of implementing!