SpikeInterface / spikeinterface

A Python-based module for creating flexible and robust spike sorting pipelines.
https://spikeinterface.readthedocs.io
MIT License
527 stars 187 forks source link

Mountainsort5 Wrapper Todo list #2607

Closed zm711 closed 7 months ago

zm711 commented 8 months ago

So after talking to @magland, he said we could tweak the wrapper here if necessary so I just wanted to list our current issues. I'm happy to try to start tackling this, but wanted to keep it open for discussion. The original motivation can be found here #2489.

Issues

TemporaryDirectory doesn't really work on Windows and the core Python team has been trying to work on this for multiple iterations. In 3.10 they introduced ignore_cleanup_errors and in 3.12 they introduced delete both trying to deal with the fact that WinError32 prevents cleanup. The current wrapper relies on creating a temporary dir in order to write out the filtered and whitened binary file which works for linux, but fails on Windows.

Currently and this is a smaller issue, the wrapper writes out the whole filtered/whitened binary file for each group during a run_sorter_by_property rather than just writing out the sub-channels. This isn't the biggest deal, but we could speed up the wrapper for people doing multi-shank work by fixing this

Solutions?

Once our min version is 3.10 we could just add ignore_cleanup_errors and it'll work on Windows. Or we remove the temp dir and the user has to cleanup up the files themselves. I still need to test if we can cleanup the file later or if because this is all the same global function whether it is impossible for us to cleanup the file while in python.

for run_sorter_by_property, I don't know. I use it all the time, but I have never actually dug deeply into the mechanism so I would love to hear if anyone has ideas.

alejoe91 commented 8 months ago

What about getting rid of the mountainsort5 write function and just write the preprocessed recording in the output folder? Simiilar to KS? That was my initial suggestion: https://github.com/SpikeInterface/spikeinterface/pull/2225#discussion_r1401894514

Then it the folder could be optionally (default True) cleaned up at the end of the sorting.

zm711 commented 7 months ago

What about getting rid of the mountainsort5 write function and just write the preprocessed recording in the output folder? Simiilar to KS? That was my initial suggestion: https://github.com/SpikeInterface/spikeinterface/pull/2225#discussion_r1401894514

Yeah I need to test that. I just don't know if the file gets released on Windows since we are in python the whole time. For KS I wonder if it works because we go python->Matlab->python so the file gets released. If the file doesn't get released then we would need another way to do it.

Have you got any of the pure python sorters to clean-up at the end on Windows? SC2 and TDC2 still fail on Windows (though I haven't check in couple months) because there are a few caching files that don't get released so the cleanup causes the whole thing to fail (I think it is complicated by the by_property for me since the cache files have the same name.)

alejoe91 commented 7 months ago

(I think it is complicated by the by_property for me since the cache files have the same name.)

It shouldn't fail because each group gets its own output_folder. I'll give it a try with this

zm711 commented 7 months ago

This only fails with Windows. (So you'd have to do a dual boot).

FAILED src/spikeinterface/sorters/external/tests/test_tridesclous.py::TridesclousCommonTestSuite::test_with_run - PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\...
FAILED src/spikeinterface/sorters/internal/tests/test_spykingcircus2.py::SpykingCircus2SorterCommonTestSuite::test_with_run - PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\...
FAILED src/spikeinterface/sorters/internal/tests/test_tridesclous2.py::Tridesclous2SorterCommonTestSuite::test_with_run - PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\...FAILED src/spikeinterface/sorters/tests/test_find_folders.py::test_find_recording_folders - AssertionError: assert 1 == 3

Here are the test suite failures for SC2 and TDC2 where they don't release their spikes cache between tests. I was seeing something similar when running on real data (but I think) @yger has added a feature to not use a cache for SC2 now.

I think the problem just comes down to whether or not the file is released during the run_sorter_by_property or not. If not then Windows will fail during the attempted cleanup.

zm711 commented 7 months ago

Closed by #2690.