KrishnaswamyLab / scprep

A collection of scripts and tools for loading, processing, and handling single cell data.
MIT License
73 stars 19 forks source link

Density subsampling #106

Open scottgigante opened 4 years ago

scottgigante commented 4 years ago

Describe the solution you'd like scprep.select.subsample(density=True, knn=3). Use nmslib if available?

Describe alternatives you've considered Use sklearn.NearestNeighbors

scottgigante commented 4 years ago
distances, _ = sklearn.NearestNeighbors(n_neighbors=3).fit(X).kneighbors()
distances = distances.max(axis=1)
p = distances / distances.sum()
X_subsample = scprep.select.select_rows(X, idx=np.random.choice(X.shape[0], n, p=p, replace=False)