SpikeInterface / spikeinterface

A Python-based module for creating flexible and robust spike sorting pipelines.
https://spikeinterface.readthedocs.io
MIT License
495 stars 187 forks source link

Compute principal components slow on Windows #3398

Open zm711 opened 1 week ago

zm711 commented 1 week ago

3249

Based on @chrishalcrow testing computing PCA on windows is an extremely slow step in our testing. I know the current implementation goes straight to ProcessPoolExecutor so maybe we need to revisit this and I can test locally on Windows? @alejoe91 ?

alejoe91 commented 1 week ago

Thanks for writing this up @zm711.

I think that the problem could also be an interaction between processes and threads. Sklearn will by default try to max out the number of threads, but we add our layer of process parallelization. In the ChunkRecordingExecutor we hav an additional max_threads_per_process arg, but the machinery is a bit more complicated. I think we should give it a try and see if it fixes the issue

zm711 commented 1 week ago

Let me link this where Chris saw this happening on his Windows machine too for newer versions of sklearn and not older. https://github.com/SpikeInterface/spikeinterface/issues/2817

zm711 commented 1 week ago

But that could be cool if it speeds things up on Windows since that is a big workflow and testing bottle neck. I haven't dug deeply into the PCA code to see how complicated it would be :)