SpikeInterface / spikeinterface

A Python-based module for creating flexible and robust spike sorting pipelines.
https://spikeinterface.readthedocs.io
MIT License
476 stars 183 forks source link

UMAP instead of PCA #3116

Open florian6973 opened 1 week ago

florian6973 commented 1 week ago

Hi!

I was playing a bit with spikecomponents, and I was wondering if someone was already working on it, or if it could be worth it that I add a clustering class (https://github.com/SpikeInterface/spikeinterface/blob/main/src/spikeinterface/sortingcomponents/clustering/method_list.py), with UMAP as a dimensionality reduction method. Indeed, this thesis shows that UMAP is promising https://dalspace.library.dal.ca/handle/10222/83717 if we manage to compute it efficiently.

Best,

Florent

zm711 commented 1 week ago

@samuelgarcia @yger any opinions on this?

yger commented 1 week ago

UMAP could easily be added in the sortingcomponents framework, as an additionnal step to project the waveforms. The problem with UMAP is that this is rather slow for large number of spikes/features, and this might also be highly dependent on the parameters. But I'll read the ref, and it has been shown to be usefu, then clearly this must be integrated!

florian6973 commented 1 week ago

Yes you are right it can be rather slow, but I was recently checking some reimplementation on GPU which is much faster for large number of samples (https://github.com/berenslab/contrastive-ne or https://github.com/rapidsai/cuml ...)

Regarding the sensitivity to the parameters, I was not aware that UMAP is very dependent on parameters... Could you elaborate please? Thanks!