SpikeInterface / spikeinterface

A Python-based module for creating flexible and robust spike sorting pipelines.
https://spikeinterface.readthedocs.io
MIT License
531 stars 188 forks source link

tridesclous run generates very large files #3528

Closed Djoels closed 1 day ago

Djoels commented 1 week ago

In running many spike sorter runs, I frequently hit out of disk space errors, in spite of having 5TB of space available.

A major reason for this seems to be the catalogue files that TDC generate. For a 10 minute / 30kHz / 37GB recording, the some_waveforms.raw, somefeatures.raw, ... some*.raw files take up 117GB, like triple the original recording size. If I make a sorting analyzer and save it, this only takes in 50MB of space (maybe that is a apples and oranges comparison?)

I don't immediately see a relevant parameter that could allow me to automatically have these removed. Could this be an idea for an improvement? I'd gladly help with this if there would be no objection from the authors.

zm711 commented 1 week ago

Just to clarify are you referring to TDC 1 or TDC 2?

Djoels commented 1 week ago

I'm referring to TDC1. Held back from using TDC2 as documentation mentioned it not being ready yet?

zm711 commented 1 week ago

I'll tag @samuelgarcia. I don't think much has been updated for TDC recently since all of Sam's efforts are on TDC. By "not ready yet", it really means that features and arguments are being updated between versions. So you could test it right now as of 0.101.x, but it will be different in 0.102.x (likely) so if you love TDC2's performance in 101 you'd have to pin to 101. You can definitely try either TDC2 or SC2, but just know they are being routinely tweaked to improve performance--no guarantees of stability yet.

samuelgarcia commented 4 days ago

Hi. I have to admit that tridesclous1 nis not maintain and updated anymore. I spend all my power in spikeinterface and also tridesclous2. The way of handling sparsity in tridesclous1 was not good this explain this big numbers. For high channel counts I would adive to not use it.

Djoels commented 1 day ago

Hi, Thank you for the update, I will try and have TDC2 running. It appears I'm encountering an error, but that'll be a topic of another issue if that's relevant :)