MouseLand / Kilosort

Fast spike sorting with drift correction for up to a thousand channels
https://kilosort.readthedocs.io/en/latest/
GNU General Public License v3.0
456 stars 238 forks source link

BUG: ValueError : Found array with 0 sample(s) (shape=(0, 41)) while a minimum of 1 is required by TruncatedSVD. #606

Closed j-moore94 closed 3 months ago

j-moore94 commented 6 months ago

Describe the issue:

Hi, (Sorry if this is just me not setting things up properly - I'm not very experienced with Kilosort.)

Every time I try to run kilosort (both from the GUI or using a notebook), I get this error. From doing some debugging, it seems like the whitening/drift correction steps results in the data becoming only zeros. I also seem to get no drift at all, which seems unlikely.

I first thought it was an issue with the way I was setting up the probe. I am using the same .mat file we used for Kilosort 2 but it seems like it is loading correctly so I'm not sure.

Any help would be very much appreciated! :)

Maybe it's helpful to add my probe variable that I get when I load my channel map file:

probe {'xc': array([ 580., 580., 620., 420., 620., 420., 620., 180., 180., 180., 380., 220., 380., 220., 380., 220., 620., 820., 420., 580., 180., 180., 220., 380., 220., 380., 420., 380., 420., 620., 580., 580., 820., 820., 780., 980., 780., 1180., 1020., 1220., 980., 1180., 1020., 1220., 780., 780., 980., 820., 1180., 1020., 1180., 1020., 1180., 1020., 1220., 1220., 980., 1220., 980., 780., 980., 780., 580., 820.], dtype=float32), 'yc': array([ -20., -180., -40., -160., -160., -40., -120., -20., -140., -60., -60., -200., -100., -120., -180., -80., -200., -80., -80., -140., -180., -100., -160., -20., -40., -140., -200., -220., -120., -80., -220., -100., -160., -200., -20., -60., -140., -140., -80., -200., -220., -60., -160., -80., -60., -220., -140., -40., -100., -40., -180., -200., -20., -120., -40., -120., -20., -160., -180., -100., -100., -180., -60., -120.], dtype=float32), 'kcoords': array([3., 3., 3., 2., 3., 2., 3., 1., 1., 1., 2., 1., 2., 1., 2., 1., 3., 4., 2., 3., 1., 1., 1., 2., 1., 2., 2., 2., 2., 3., 3., 3., 4., 4., 4., 5., 4., 6., 5., 6., 5., 6., 5., 6., 4., 4., 5., 4., 6., 5., 6., 5., 6., 5., 6., 6., 5., 6., 5., 4., 5., 4., 3., 4., 8., 7., 7., 7., 9.], dtype=float32), 'chanMap': array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63]), 'n_chan': 69}

Reproduce the bug:

Error message:

drift computed in  746.40s; total  748.88s

Traceback (most recent call last):

  File "C:\Users\jmoore\.conda\envs\kilosort\lib\site-packages\kilosort\gui\run_box.py", line 241, in add_plot_data

plot_drift_scatter(plot_window2, st0, settings)

  File "C:\Users\jmoore\.conda\envs\kilosort\lib\site-packages\kilosort\gui\sanity_plots.py", line 57, in plot_drift_scatter

p1.setTitle('Spike amplitude across time and depth', color='black')

  File "C:\Users\jmoore\.conda\envs\kilosort\lib\site-packages\pyqtgraph\graphicsItems\PlotItem\PlotItem.py", line 1155, in setTitle

self.titleLabel.setText(title, **args)

  File "C:\Users\jmoore\.conda\envs\kilosort\lib\site-packages\pyqtgraph\graphicsItems\LabelItem.py", line 58, in setText

color = fn.mkColor(color)

  File "C:\Users\jmoore\.conda\envs\kilosort\lib\site-packages\pyqtgraph\functions.py", line 297, in mkColor

args = [r,g,b,a]

UnboundLocalError
: 
local variable 'r' referenced before assignment

Extracting spikes using templates

Re-computing universal templates from data.

Traceback (most recent call last):

  File "C:\Users\jmoore\.conda\envs\kilosort\lib\site-packages\kilosort\gui\sorter.py", line 82, in run

st, tF, Wall0, clu0 = detect_spikes(ops, self.device, bfile, tic0=tic0,

  File "C:\Users\jmoore\.conda\envs\kilosort\lib\site-packages\kilosort\run_kilosort.py", line 332, in detect_spikes

st0, tF, ops = spikedetect.run(ops, bfile, device=device, progress_bar=progress_bar)

  File "C:\Users\jmoore\.conda\envs\kilosort\lib\site-packages\kilosort\spikedetect.py", line 193, in run

ops['wPCA'], ops['wTEMP'] = extract_wPCA_wTEMP(

  File "C:\Users\jmoore\.conda\envs\kilosort\lib\site-packages\kilosort\spikedetect.py", line 70, in extract_wPCA_wTEMP

model = TruncatedSVD(n_components=ops['settings']['n_pcs']).fit(clips)

  File "C:\Users\jmoore\.conda\envs\kilosort\lib\site-packages\sklearn\decomposition\_truncated_svd.py", line 209, in fit

self.fit_transform(X)

  File "C:\Users\jmoore\.conda\envs\kilosort\lib\site-packages\sklearn\utils\_set_output.py", line 157, in wrapped

data_to_wrap = f(self, X, *args, **kwargs)

  File "C:\Users\jmoore\.conda\envs\kilosort\lib\site-packages\sklearn\base.py", line 1152, in wrapper

return fit_method(estimator, *args, **kwargs)

  File "C:\Users\jmoore\.conda\envs\kilosort\lib\site-packages\sklearn\decomposition\_truncated_svd.py", line 229, in fit_transform

X = self._validate_data(X, accept_sparse=["csr", "csc"], ensure_min_features=2)

  File "C:\Users\jmoore\.conda\envs\kilosort\lib\site-packages\sklearn\base.py", line 605, in _validate_data

out = check_array(X, input_name="X", **check_params)

  File "C:\Users\jmoore\.conda\envs\kilosort\lib\site-packages\sklearn\utils\validation.py", line 967, in check_array

raise ValueError(

ValueError
: 
Found array with 0 sample(s) (shape=(0, 41)) while a minimum of 1 is required by TruncatedSVD.

Version information:

Windows 10, Python 3.9, CUDA 11.8

Context for the issue:

No response

Experiment information:

64-channel silicon probe, plus 5 non-ephys channels included in the binary file (total 69 channels). Recorded in Intan at 20000 samples/s.

jacobpennington commented 6 months ago

Hello,

The first part of that traceback, UnboundLocalError: local variable 'r' referenced before assignmentis related to an outdated dependency, I updated the setup file recently to fix that. You can run pip install --upgrade pyqtgraph in your kilosort environment to fix that without reinstalling, which will cause an additional drift correction plot to show up.

I'll have to do some more poking around to figure out where that other error is coming from. Are you able to post a screenshot of what the gui looks like after you've loaded your data & probe?

Separately, based on the probe configuration you've posted, it's not likely that drift correction will work well. It looks like there are only a handful of contacts on each shank, which is not going to give enough spread or sampling density along the depth of the probe to get a reliable drift estimate. So, you should probably just set nblocks = 0 to disable that step instead.

j-moore94 commented 6 months ago

Hi Jacob, thanks for getting back to me. I've just run it through the GUI with nblocks = 0 and I now get an error saying that no spikes have been detected:

0 spikes extracted in  741.52s; total  744.00s

Traceback (most recent call last):

  File "C:\Users\jmoore\.conda\envs\kilosort\lib\site-packages\kilosort\gui\sorter.py", line 82, in run

st, tF, Wall0, clu0 = detect_spikes(ops, self.device, bfile, tic0=tic0,

  File "C:\Users\jmoore\.conda\envs\kilosort\lib\site-packages\kilosort\run_kilosort.py", line 337, in detect_spikes

raise ValueError('No spikes detected, cannot continue sorting.')

ValueError
: 
No spikes detected, cannot continue sorting.

(I've run this recording through Kilosort 2.5 previously and detected lots of spikes!)

Here's a screenshot of the GUI with the datafile loaded:

image

jacobpennington commented 6 months ago

Sorry, what does the data look like if you you click on 'whitened' at the bottom right of the plot area? I'm just trying to make sure it looks like the data loaded correctly, which is hard to tell from the raw view. If it does look like everything loaded in correctly, you could try reducing Th (universal) and Th (learned), maybe to something like 6 and 5, respectively. I believe the default values for those have changed between Kilosort versions.

j-moore94 commented 6 months ago

No problem, here's the whitened data. :)

image

I got the same error again using 6 and 5 for the thresholds.

I assume this isn't relevant because I am using Windows not Linux, but I also get this warning earlier on:

Re-computing universal templates from data.

C:\Users\jmoore\.conda\envs\kilosort\lib\site-packages\threadpoolctl.py:1186: RuntimeWarning: 
Found Intel OpenMP ('libiomp') and LLVM OpenMP ('libomp') loaded at
the same time. Both libraries are known to be incompatible and this
can cause random crashes or deadlocks on Linux when loaded in the
same Python program.
Using threadpoolctl may cause crashes or deadlocks. For more
information and possible workarounds, please see
    https://github.com/joblib/threadpoolctl/blob/master/multiple_openmp.md

  warnings.warn(msg, RuntimeWarning)
jacobpennington commented 6 months ago

Yeah that's a known warning, we haven't seen any problems from it. I tried poking around a bit more, but so far I'm not seeing anything that would cause this based on the probe information. Are you able to share some sample data so that I can debug this?

rdarie commented 6 months ago

I may have stumbled upon the same issue as well (and a solution). I'm working with a 30 channel probe and saw the same error about no spikes being detected. On line 150 of spikedetect.py, Aa[iC2] had entries that were all NaNs past the 30th index, which caused Amax to be all NaNs as well. I think this was happening because nearest_templates, by default, is 100 and exceeds the max number of channels. Setting it to less than that fixes the problem. Maybe there can be a check so that this happens automatically?

jacobpennington commented 6 months ago

That's helpful, thanks! I'll take a look at that. The software has worked fine on other datasets that 32 or 64 channels, so it's not just a channel count issue, but that narrows it down at least. I also can't reproduce the error by increasing nearest_templates to greater than the channel count on neuropixels data, so unfortunately it's not that simple.

j-moore94 commented 6 months ago

Hi, thanks for the input Radu. I extracted a short section from my dat file to send, and I just tried running it with 100 nearest_templates and 50 nearest_templates. With 50, I do detect some spikes, but then run into this error:

Traceback (most recent call last):

  File "C:\Users\jmoore\.conda\envs\kilosort\lib\site-packages\kilosort\gui\sorter.py", line 82, in run

st, tF, Wall0, clu0 = detect_spikes(ops, self.device, bfile, tic0=tic0,

  File "C:\Users\jmoore\.conda\envs\kilosort\lib\site-packages\kilosort\run_kilosort.py", line 343, in detect_spikes

Wall3 = template_matching.postprocess_templates(Wall, ops, clu, st0, device=device)

  File "C:\Users\jmoore\.conda\envs\kilosort\lib\site-packages\kilosort\template_matching.py", line 83, in postprocess_templates

Wall2, _ = align_U(Wall, ops, device=device)

  File "C:\Users\jmoore\.conda\envs\kilosort\lib\site-packages\kilosort\template_matching.py", line 70, in align_U

X = conv1d(X.unsqueeze(1), ops['wTEMP'].unsqueeze(1), padding=ops['nt']//2)

RuntimeError
: 
Calculated padded input size per channel: (40). Kernel size: (41). Kernel size can't be greater than actual input size

With 100 templates, I get the same error as before. Here's a link to the 10 minute section of data if you want to try: https://drive.google.com/drive/folders/1YyX9wzetNu4zcg6JTYvmY4-l21NuD1WD?usp=sharing

rdarie commented 6 months ago

That's interesting. It's worth noting that I never encountered the error with TruncatedSVD() in the first place, so there might be several differences between our datasets that are causing the issue. I've uploaded a segment of the data I'm working on at the link below:

https://drive.google.com/drive/folders/1JMHhmmHRH6OTHXtbrxgOXzBTpfIUS-WM?usp=sharing

By the way, @jacobpennington, should I expect sorting results to be comparable between KiloSort3 and KiloSort4? I'm finding that they agree in the case of really large units, but KS4 overall finds fewer units on the same dataset. I can start a separate issue with some examples if that's helpful.

jacobpennington commented 6 months ago

@rdarie In general no, they may not be comparable especially if there's a lot of drift. You can check out the bioarxiv preprint for an overview (Table 1 and Table 2 in particular).

marius10p commented 6 months ago

@jacobpennington @rdarie To be clear, Kilosort4 should usually work better and has the same pipeline backbone as Kilosort3 and the same drift correction, so the same thresholds should produce roughly the same number of spikes and units. What is different is the clustering algorithm, which should result in more well-isolated units / better refractory periods.

ZaneMuir commented 6 months ago

Hi, we are using a 32 channel probe and had the same issue of 0 samples for TruncatedSVD. By lowering the nearest_templates to 50 (or even smaller values for some recordings), there are spikes.

Is there a rule of thumb for picking different values, for nearest_templates, or in general?

rdarie commented 6 months ago

Thanks for the clarification! With that in mind, it looks to me that, at least on my dataset, everything else seems to work just fine after lowering n_nearest_templates.

jacobpennington commented 6 months ago

@j-moore94 It looks like the second issue you ran into after changing nearest_templates is related to other issues users are reporting with multi-shank probes. Basically, it's hard to determine a good value for dminx automatically when there are large (more than 50 microns) horizontal gaps between contacts since we aren't explicitly handling multiple shanks yet.

I got reasonable-looking results for your data by increasing dminx (under "Extra settings") to 400, you could play around with it some more from there. That controls the lateral spacing of the templates we use for spike detection, which tile the space of the probe.

If you're not satisfied with the results from that, you can try artificially rearranging your probe file to vertically stack contacts from different shanks, though it's not clear to me what the best way to do that is with your staggered probe arrangement, so I would try tweaking dminx first.

j-moore94 commented 6 months ago

Great thanks, this seems to have worked for the full recording too! Any advice on how to choose the value of dminx and nearest_templates or should it not make much of a difference?

(EDIT: You can ignore the following, as I reinstalled again and got it working!):

Also, when I tried to check the results in phy, I get the issue mentioned in #601. I then tried reinstalling using the command pip install git+https://github.com/mouseland/kilosort.git as suggested there, but now I'm getting this error when I try to run the GUI:

QObject::moveToThread: Current thread (0x19dbd83b380) is not the object's thread (0x19dbd83ca00).
Cannot move to target thread (0x19dbd83b380)

qt.qpa.plugin: Could not load the Qt platform plugin "windows" in "" even though it was found.
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.

Available platform plugins are: minimal, offscreen, webgl, windows.
j-moore94 commented 6 months ago

Just as a follow-up to the question about choosing the values, I am now getting some good clusters, but some of the channels of neighbouring shanks are included in certain clusters - e.g. in the below example, channels 30 and 1 are not part of the same shank that the spike is on. I guess I just need to optimise nearest_templates and dminx to fix this? Or I was wondering if arbitrarily increasing the spacing between the shanks could be a solution to force Kilosort to only compare channels on the same shank?

image

jacobpennington commented 6 months ago

Hi @j-moore94

The first problem you mentioned about the qt.qpa.plugin error is related to #597 and #613. Unfortunately it's not specific to Kilosort4, it's a general problem with pyqt. We're looking into alternative backends for the GUI to get around this, but in the meantime you can try the suggestions mentioned in those issues.

As for channels from different shanks being compared, currently that can happen because we are not using shank information for sorting. That's on the to-do list as well, but @marius10p might have some suggestions for how to avoid that in the meantime.

j-moore94 commented 6 months ago

Hi @jacobpennington,

Thanks, I actually don't have the pyqt problem anymore after reinstalling through the other method so it's all good.

Okay, it seems to be working pretty well even with comparing shanks, so I'll have a play around with some of the settings to see what works best.

Thanks again for your help! :)

NirvikNU commented 6 months ago

With plexon S-probe 32 data, lowering the nearest_templates to 10 and making dmin 150 (inter-electrode spacing) still produces the no spikes detected error. Any suggestons?

j-moore94 commented 5 months ago

Hi @NirvikNU, it was dminx not dmin that needed to be changed for me - if your probe is multi-shank too then you may also have to do that. :)

jacobpennington commented 5 months ago

It sounds like the original issue has been resolved more or less (and a separate one has been opened for the latter comments), so I'm closing this out for now. Please let us know if you run into more problems related to this.

FlyingFordAnglia commented 3 months ago

Hi all! I am trying to spike sort data from a Neurolight probe recording (4 shanks, 250 microns apart, 32 channels total) using KS4. I am encountering the same issue as the OP no matter what value I set dminx or nearest_templates to. The universal template in the probe preview plot looks alright, so I am not sure what the issue is. I know that the data has spikes. ValueError : Found array with 0 sample(s) (shape=(0, 81)) while a minimum of 1 is required by TruncatedSVD. I get the same error even if I try to vertically stack the shanks into one. Here is the drive link containing the binary file to be sorted, the probe file and the exported KS4 settings I used. https://drive.google.com/drive/folders/1qpUR7X_IpbucLa1OfNo669x4-PxU6Fy_?usp=sharing Attached is a snapshot of the error, and the probe preview. kilosort4_error Is there anything I am missing here?

jacobpennington commented 3 months ago

@FlyingFordAnglia Can you please tell me what version of Kilosort4 you're using and include a screenshot of the "Extra Settings" window? Also, what is the horizontal spacing of contacts on the same shank? It looks like they're staggered a bit.

FlyingFordAnglia commented 3 months ago

@jacobpennington I'm using version 4.0.7. I've attached a screenshot of the extra settings I used in the last attempt. Contacts on the same shank are 40 microns apart in x direction. I've attached a probeinterface layout of the probe as well. probe_interface_layout extra_settings

FlyingFordAnglia commented 3 months ago

Also, not sure if it is relevant, but the recording was done using a test audio file with spikes superimposed on a 5 Hz sine wave that are input to a head stage testing unit. So all the channels have an identical trace (picture attached): trace_before_preprocessing

jacobpennington commented 3 months ago

@FlyingFordAnglia You should be setting dminx = 40 to match the spacing within shanks. The default values for all the other parameters in 'extra settings' should work fine. The recommendations in this issue were given prior to a lot of changes to the code, you should not use it as a reference for the current version.

However, the synthetic data you're using is likely to cause all sorts of problems on its own. If all channels have identical waveforms, many of the pipeline steps will not behave as expected and errors are not surprising. So, trying to debug based on this recording is not going to be very helpful.

mberns-ru commented 3 months ago

Hi! I'm also getting the same bug:

image

With the following code, which uses SpikeInterface (si):

    sorting_ks4 = si.run_sorter(sorter_name='kilosort4', recording=recording_saved, remove_existing_folder=True,
                                output_folder=base_folder / name,
                                verbose=True, dmin = 25, dminx=22.5)

The contacts on my 64-channel, 4-shank probe are 25 microns apart vertically and 22.5 microns apart horizontally. Each shank is 250 microns apart.

Do you have any suggestions on how to fix this? I'm unsure how to move forward since the dmin and dminx solution didn't seem to work (unless I've implemented it incorrectly). I'm using Kilosort 4.08. Thank you!

jacobpennington commented 3 months ago

@mberns-ru Please try setting nblocks = 0 to skip drift correction, it's not likely to work well with only 16 channels on each shank and could introduce artifacts.

mberns-ru commented 3 months ago

@jacobpennington Thank you for the quick response! I tried that but it's still producing the same error.

jacobpennington commented 3 months ago

Can you attach the kilosort4.log file from the results directory? If it's not present you will need to run Kilosort4 without using SpikeInterface, I'm not sure how SI will interact with the logging process.

mberns-ru commented 3 months ago

@jacobpennington It's not present. I will try to run it without SI and let you know how it goes.

mberns-ru commented 3 months ago

@jacobpennington I've attached the log! kilosort4.log

jacobpennington commented 3 months ago

Thanks, unfortunately not very helpful hah. Are you able to share the data and probe file so that I can try running it myself?

mberns-ru commented 3 months ago

@jacobpennington Sure! Here is a folder with the .prb and .raw file. Let me know if you need anything else!

jacobpennington commented 3 months ago

@mberns-ru This is what I see when I load the data in the Kilosort4 GUI, the entire recording is like this (i.e. it just looks like white noise with no neural activity). So it's not surprising that an error occurred, and no settings changes will help with that. image

Assuming that gets fixed, I would leave dmin as the default (i.e. don't set dmin = 25) or set it to 12.5, since that's the vertical spacing between nearest channels. Otherwise there will be no templates placed in between the contacts. You could also use dminx = 45 instead of 22.5, even though that's technically not accurate, otherwise there will be a lot more templates placed on the horizontal axis than needed (which is just a side effect of how the shanks are placed).

Or, visually: your current settings result in this placement (templates are white dots, contacts are green squares): image

Changing to my recommendation (dmin = None, dminx = 45) would look like this: image

mberns-ru commented 3 months ago

@jacobpennington Hi again! I found out that the raw data was in float32, so I created a new binary file using KS4's io:

io.spikeinterface_to_binary(
        recording, results_folder, data_name=name_scheme+'-32BINARY.bin', dtype=dtype,
        chunksize=60000, export_probe=True, probe_name='probe.prb'
        )

This generated data that I thought looked better (albeit, this is my first time with neural data ever, so I don't have much context):

image

However, it shows up completely blank when I open the "whitened" tab:

image

When I click "run," it generates the same error as initially discussed. I'm unsure why this is happening, but I'd imagine it's the same issue as when I was running KS4 in SpikeInterface as opposed to the GUI because SpikeInterface was also inputting a float32 type file.

Do you have any suggestions? I've uploaded the new file, the int16-binary version (unscaled), and the .prb file here.

jacobpennington commented 3 months ago

I think this is a bug with how we're handling float32 data, I will look into it.

jacobpennington commented 3 months ago

@mberns-ru I added a fix for this in the latest version (v4.0.11). In short, your data is scaled so differently from what KS4 expects for int16 data that it was causing some steps to fail. I added new "shift" and "scale" parameters to address this (you can find these in the "Extra settings" window in the GUI). I found that setting scale = 1000 worked well for the data you shared. You can click "Load" after changing the value to make sure the data shows up as expected, the goal is to get it roughly in the -100 to +100 range for the raw data.

I'm going to close this for now since this is getting pretty far off from the original issue topic, and this should fix your particular problem. If you encounter any more issues after trying that change, please open a new issue.