Open m-beau opened 6 months ago
I would suggest benchmarking the docker speed on Kilosort2. It sounds like locally with Matlab you are about 1x recording time. So see what docker takes. That should give you a proxy of the docker slowdown. As far as the SI wrapper in general, that will also have a slight slowdown compared to native KS4 (this is expected since we are calling KS stuff from spikeinterface stuff). This lengthening of time is increased if you are doing things like run_sorter_by_property
because it we split your recording so you analyzer each "shank" separately (so really for a four shank of 20 minutes you would have 80 min of recording). This should be a good thing since it provides more isolation, but it means that running this will be quite a bit slower than running KS4 natively (with a likely accuracy boost). Does that make sense? Basically the question we have to ask is what specifically are you sorting and how are you sorting it with spikeinterface?
Both run_sorter
and run_sorter_by_property
can have an optional verbose=True
option to provide more info, but your mileage will vary. For example if you're running KS2 , 2.5, or 3 in docker then the python process has to hand the data off to the Matlab process and we can't control the progress of the Matlab stuff so no progress bar there. Same would be true if you use a python spike-sorter (like MS4 or MS5) they would have to have their own progress bar implemented in order for you to see that during sorting. If you have verbose=True
a progress bar will display for spikeinterface things when possible.
Did I miss anything @alejoe91?
I will say maybe (and we discussed this previously) we should have an option to write a binary and load that into KS instead of using their RecordingAsArray or whatever because that doesn't have multiprocessing so that might slowdown the process a bit. So in the case of someone sorting over the network it might make sense to just write the binary file where-ever they want and then sort from that for KS4 so that we can leverage our n_jobs
, no?
Thanks Zach, I will simply do some careful benchmarking to bring more useful information to the table!
It is a shame for the progressbars, it would be cool to hack the stdout of the sorters (or whatever the MATLAB equivalent would be) to somehow use their own 'verbose output' and assess progress. Or, when it is already implemented in the sorter, make it write on a .log file and read the progress from this .log file.
That sounds like a fun PR if you want to give it a go! I'm sure we would have users that would be interested in that :)
Quick question: Kilosort 4 in a spikeinterface docker container took about 4hours to run on a 20min long Neuropixels dataset (Kilosort 2 on MATLAB on the same machine takes about 30 minutes), so I believe that my installation is sub-optimal.
Is running sorters in containers knows to slow down the process, or am I doing something wrong? Do I need to check whether 'all my my GPU capabilities are used' somehow (not sure if it could fall back on running on CPU, but this would take much longer than 4h so I don't believe this must be happening)? In general, any tips to accelerate spikeinterface's sorters?
And also, any way to print out a progress bar of some kind? That would be very useful.
Thanks!