MouseLand / Kilosort

Fast spike sorting with drift correction
https://kilosort.readthedocs.io/en/latest/
GNU General Public License v3.0
477 stars 248 forks source link

Assertion Error when using Phy2 #284

Closed Idavr closed 3 years ago

Idavr commented 3 years ago

Hi,

Recently after upgrading to Kilosort2.5 I am having trouble opening the spike sorted outputs to phy (standard and development version). I ran "--debug" on the command and I get this output + the error:

10:04:20.400 [D] init:68 Start capturing exceptions. 10:04:20.494 [W] model:591 Unreferenced clusters found in templates (generally not a problem) 10:04:20.547 [D] model:607 Loading spike clusters. 10:04:20.607 [W] model:613 Unreferenced clusters found in spike_clusters (generally not a problem) 10:04:20.663 [D] model:557 No channel shank file found. 10:04:20.664 [D] model:680 Loading templates. 10:04:20.666 [D] model:709 Templates are dense. 10:04:20.666 [W] model:655 Skipping spike waveforms that do not exist, they will be extracted on the fly from the raw data as needed. 10:04:20.667 [D] model:715 Loading the whitening matrix. 10:04:20.669 [D] model:722 Loading the inverse of the whitening matrix. 10:04:20.676 [W] model:55 4908/1507984 values are nan in C:\Users\idavalik\Data\Raw_SpikeGLX\Social_track_g0\Social_track_g0_imec0\similar_templates.npy, replacing by zero. 10:04:20.680 [D] model:751 Loading features. 10:04:20.682 [D] model:766 Features are sparse. 10:04:20.683 [D] model:788 Loading template features. 10:04:20.685 [D] model:798 Template features are sparse. 10:04:20.686 [D] model:492 Load cluster_Amplitude.tsv. 10:04:20.690 [D] model:492 Load cluster_ContamPct.tsv. 10:04:20.694 [D] model:492 Load cluster_group.tsv. 10:04:20.699 [D] model:492 Load cluster_KSLabel.tsv. 10:04:20.996 [D] context:100 Initialize joblib cache dir at C:\Users\idavalik\Data\Raw_SpikeGLX\Social_track_g0\Social_track_g0_imec0\.phy. 10:04:20.998 [D] context:101 Reducing the size of the cache if needed. 10:04:21.001 [D] base:102 Add filter high_pass. 10:04:21.002 [D] config:31 Load config file C:\Users\idavalik\.phy\phy_config.py. 10:04:21.003 [D] plugin:145 Loading 0 plugins. 10:04:21.006 [D] context:209 The file C:\Users\idavalik\Data\Raw_SpikeGLX\Social_track_g0\Social_track_g0_imec0\.phy\new_cluster_id.pkl doesn't exist. 10:04:21.089 [D] context:185 Save data to C:\Users\idavalik\Data\Raw_SpikeGLX\Social_track_g0\Social_track_g0_imec0\.phy\spikes_per_cluster.pkl. 10:04:21.227 [D] gui:463 Creating GUI. 10:04:21.229 [D] state:46 Load C:\Users\idavalik.phy\TemplateGUI\state.json for GUIState. 10:04:21.230 [D] state:46 Load C:\Users\idavalik\Data\Raw_SpikeGLX\Social_track_g0\Social_track_g0_imec0.phy\state.json for GUIState. 10:04:21.595 [E] init:62 An error has occurred (AssertionError): Traceback (most recent call last): File "C:\Users\idavalik\Anaconda3\envs\phy2dev\Scripts\phy-script.py", line 33, in sys.exit(load_entry_point('phy', 'console_scripts', 'phy')()) File "C:\Users\idavalik\Anaconda3\envs\phy2dev\lib\site-packages\click\core.py", line 829, in call return self.main(args, kwargs) File "C:\Users\idavalik\Anaconda3\envs\phy2dev\lib\site-packages\click\core.py", line 782, in main rv = self.invoke(ctx) File "C:\Users\idavalik\Anaconda3\envs\phy2dev\lib\site-packages\click\core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "C:\Users\idavalik\Anaconda3\envs\phy2dev\lib\site-packages\click\core.py", line 1066, in invoke return ctx.invoke(self.callback, ctx.params) File "C:\Users\idavalik\Anaconda3\envs\phy2dev\lib\site-packages\click\core.py", line 610, in invoke return callback(args, kwargs) File "C:\Users\idavalik\Anaconda3\envs\phy2dev\lib\site-packages\click\decorators.py", line 21, in new_func return f(get_current_context(), args, kwargs) File "c:\users\idavalik\phy\phy\apps__init__.py", line 159, in cli_template_gui template_gui(params_path, kwargs) File "c:\users\idavalik\phy\phy\apps\template\gui.py", line 218, in template_gui gui = controller.create_gui() File "c:\users\idavalik\phy\phy\apps\base.py", line 1697, in create_gui self.supervisor.attach(gui) File "c:\users\idavalik\phy\phy\cluster\supervisor.py", line 946, in attach self._create_views( File "c:\users\idavalik\phy\phy\cluster\supervisor.py", line 761, in _create_views gui, data=self.cluster_info, columns=self.columns, sort=sort) File "c:\users\idavalik\phy\phy\cluster\supervisor.py", line 921, in cluster_info return [self.get_cluster_info(cluster_id) for cluster_id in self.clustering.cluster_ids] File "c:\users\idavalik\phy\phy\cluster\supervisor.py", line 921, in return [self.get_cluster_info(cluster_id) for cluster_id in self.clustering.cluster_ids] File "c:\users\idavalik\phy\phy\cluster\supervisor.py", line 746, in get_cluster_info out[key] = func(cluster_id) File "c:\users\idavalik\phy\phy\apps\base.py", line 1194, in get_best_channel_label return self._get_channel_labels([self.get_best_channel(cluster_id)])[0] File "c:\users\idavalik\phy\phy\utils\context.py", line 154, in memcached out = f(args, kwargs) File "c:\users\idavalik\phy\phy\apps\base.py", line 1188, in get_best_channel channel_ids = self.get_best_channels(cluster_id) File "c:\users\idavalik\phy\phy\utils\context.py", line 154, in memcached out = f(*args, **kwargs) File "c:\users\idavalik\phy\phy\apps\template\gui.py", line 151, in get_best_channels template = self.model.get_template(template_id) File "c:\users\idavalik\phylib\phylib\io\model.py", line 928, in get_template return self._get_template_dense( File "c:\users\idavalik\phylib\phylib\io\model.py", line 867, in _get_template_dense channelids, amplitude, best_channel = self._find_best_channels( File "c:\users\idavalik\phylib\phylib\io\model.py", line 844, in _find_best_channels assert best_channel in channel_ids AssertionError

I have no problems opening up earlier files that were processed with KS2, so from the help I have been getting trying to solve this it seems like there is a mismatch between something in phy and something in the way KS2.5 saves the output, but I am having trouble figuring out exactly where the error is coming from. If you have any insight into this I would greatly appreciate it!

LaurenzMuessig commented 3 years ago

channel_map.npy is a row vector in KS2.5 whereas it's a column vector in KS2 - that might be the issue (I had a similar issue when trying to use part of the Allen ecephys repo after upgrading to KS2.5).

Cheers

Laurenz

Idavr commented 3 years ago

Ah, I see! May I ask how you fixed it? I am still quite new to program-language and e-phys...

LaurenzMuessig commented 3 years ago

In Matlab you can readNPY('YourChannelMap.npy'); then make the variable you loaded a column vector by e.g. transposing someVariable'; then overwrite the channel_map by writeNPY(someVariable,'YourChannelMap.npy'); (readNPY and writeNPY are functions from the npy_matlab repo which KS uses, so you should have them). Not sure this is really your problem btw. I can load my data into phy without doing that, but your error message reminded me of the issue I had with the Allen repo. I hope that helps

Laurenz

Idavr commented 3 years ago

Hmm... Thanks for the suggestion, but looking over the channel_map.npy files of two different files I found that they were similar, but one files is possible to open with phy, while the second is not... I got someone else to look at the KS2.5 outputs and they had not problems opening the files so it isn't (luckily) anything wrong with the recordings themselves, but this new error that has popped up is very, very confusing as - for some unknown reason - I am unable to open any of my more recent files with phy. Whatever more suggestions to which files or avenues to look into to troubleshoot this is greatly appreciated!

LaurenzMuessig commented 3 years ago

Sorry to hear you still can't load your data into phy. But if you are saying someone else can load your data with no issue on a different machine (did I understand that correctly?), I would just delete phy on your machine and do a clean install.

Idavr commented 3 years ago

You did understand correctly, and this is where it get's peculiar. Not only have I done a clean install by deleting phy (normal and developers version) as well as the virtual environments on the computer, but I have a second computer that has the exact same problem with the exact same files...

LaurenzMuessig commented 3 years ago

Okay. but you can open the data on someone elses machine who is running a recent version of phy?

Idavr commented 3 years ago

Correct. I updated to the most recent versions as well and I still can not open them.

LaurenzMuessig commented 3 years ago

Have you deleted the .phy folder from your data directory before trying to reopen in phy after the clean install?

Idavr commented 3 years ago

Yes.

LaurenzMuessig commented 3 years ago

Okay, The last thing I can suggest is to go and 'hack' the python script where the error occurs. You can e.g. temporarily add some print commands around the line where the error occurs, to print the variable contents of the things that cause the issue (_bestchannel and _channelids judging from the error message you posted above) to your terminal. That might give you a better idea what's going wrong.

Idavr commented 3 years ago

Thanks for the assistance, I'll look into it. Troubleshooting since yesterday morning indicates that there is definitely something wonky with my personal phy set-up, but exactly what and how to fix it is vexing. Appreciate you taking the time to answer!

LaurenzMuessig commented 3 years ago

No worries. I hope you find the issue quickly. Good luck

Idavr commented 3 years ago

Update: It does seem like there is some kind of issue with the KS2.5 output, as even if I specify the correct channels the params.py file gets saved with 383 channels (instead of 385) and the dat-path is set to be the temp_wh.dat file instead of the raw .bin file. Running the same files through KS2 I had no trouble accessing them with phy and params.py was saved correctly.

LaurenzMuessig commented 3 years ago

I think it's supposed to be like that in KS 2.5. The output binary is the drift corrected raw data, so you should use that for the curation in phy (although tbh I don't see a big difference to the uncorrected raw data but I guess that depends how big the drift is in the dataset). That data will only contain the channels that are contained in the channel map, so it's missing at least the reference and the sync channel from a 1.0 probe.

Idavr commented 3 years ago

Ah, ok. The error does seem to be related to the channels then, and there is nothing wrong with the files themselves (like I said, I can open them without problems when run through with KS2), but could this have something to do with some warnings that were thrown up when doing MexGPUall.m regarding spikedetector3.cu? It finished with Mex Compiled Successfully but noticing that all of the warnings referenced channel-related variables (like NchanMax and NchanUp) it made me curious.

marius10p commented 3 years ago

@ldavr, the mex compilation warnings have nothing to do with it. I'm trying to reproduce the problem on my end, will keep you posted.

Idavr commented 3 years ago

Thank you very much @marius10p ! Really appreciate it. This showed up in the terminal right before the Assertion Error:

4908/1507984 values are nan in C:\Users\idavalik\Data\Neuropixels\Social_track_g0\Social_track_g0_imec0\similar_templates.npy, replacing by zero. 12:24:36.108 [E] init:62

In case it might help.

Idavr commented 3 years ago

I tried messing around with this more as well and the issue is indeed inconsistent. For this particular data set I have posted about here I found one bad channel that I tried disconnecting, but that still gave the same Assertion Error. However, when I connected channel 192 and also included the bad channel in the analysis (thus 0 channels being disconnected) I was able to open this specific file with phy.

torbenott commented 3 years ago

We see the same issue, here are more details.

Our error occurs on a slightly different line number (but it seems to be the same stack trace) reproduced here anyway

(phy2) D:\Neurodata\TQ01\20201222_165239\20201222_165239.kilosort>phy template-gui params.py ←[33m18:50:20.914 [W] model:545 Unreferenced clusters found in templates (generally not a problem)←[0m ←[33m18:50:20.945 [W] model:567 Unreferenced clusters found in spike_clusters (generally not a problem)←[0m ←[33m18:50:20.955 [W] model:54 212668/565504 values are nan in D:\Neurodata\TQ01\20201222_165239\20201222_165239.kilosort\similar_templates.npy, replacing by zero.←[0m ←[31m18:50:21.234 [E] __init__:62 An error has occurred (AssertionError): Traceback (most recent call last): File "c:\users\adam\anaconda3\envs\phy2\lib\runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "c:\users\adam\anaconda3\envs\phy2\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Users\Adam\anaconda3\envs\phy2\Scripts\phy.exe\__main__.py", line 7, in <module> sys.exit(phycli()) File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\click\core.py", line 829, in __call__ return self.main(*args, **kwargs) File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\click\core.py", line 782, in main rv = self.invoke(ctx) File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\click\core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\click\core.py", line 1066, in invoke return ctx.invoke(self.callback, **ctx.params) File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\click\core.py", line 610, in invoke return callback(*args, **kwargs) File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\click\decorators.py", line 21, in new_func return f(get_current_context(), *args, **kwargs) File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\phy\apps\__init__.py", line 159, in cli_template_gui template_gui(params_path, **kwargs) File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\phy\apps\template\gui.py", line 199, in template_gui gui = controller.create_gui() File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\phy\apps\base.py", line 1638, in create_gui self.supervisor.attach(gui) File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\phy\cluster\supervisor.py", line 942, in attach gui=gui, sort=gui.state.get('ClusterView', {}).get('current_sort', None)) File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\phy\cluster\supervisor.py", line 760, in _create_views gui, data=self.cluster_info, columns=self.columns, sort=sort) File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\phy\cluster\supervisor.py", line 916, in cluster_info return [self.get_cluster_info(cluster_id) for cluster_id in self.clustering.cluster_ids] File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\phy\cluster\supervisor.py", line 916, in <listcomp> return [self.get_cluster_info(cluster_id) for cluster_id in self.clustering.cluster_ids] File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\phy\cluster\supervisor.py", line 745, in get_cluster_info out[key] = func(cluster_id) File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\phy\apps\base.py", line 1150, in get_best_channel_label return self._get_channel_labels([self.get_best_channel(cluster_id)])[0] File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\phy\utils\context.py", line 154, in memcached out = f(*args, **kwargs) File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\phy\apps\base.py", line 1144, in get_best_channel channel_ids = self.get_best_channels(cluster_id) File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\phy\utils\context.py", line 154, in memcached out = f(*args, **kwargs) File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\phy\apps\template\gui.py", line 149, in get_best_channels template = self.model.get_template(template_id) File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\phylib\io\model.py", line 855, in get_template return self._get_template_dense(template_id, channel_ids=channel_ids) File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\phylib\io\model.py", line 798, in _get_template_dense channel_ids_, amplitude, best_channel = self._find_best_channels(template) File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\phylib\io\model.py", line 787, in _find_best_channels assert best_channel in channel_ids AssertionError ←[0m

Following the advice in the Github issue on the Phy page, we have tried to debug this error a number of ways.

  1. We changed the parameters nblocks = 0 and minFr = 1/600, this change fixed our issue for recordings from one rat, however it does not fix our issue for recordings for a second rat.
  2. We tried downgrading to KS2.0. This always works (for both rats) however the clustering results are worse for the rat we are able to run in KS2.5, and we would really like to check KS2.5 in the other rat. In particular we see what looks like instability and noisy events in good clusters in KS2.0, even though to our eyes the recording looks fine, and even is stable across days. A short recording from the rat in question is here https://drive.google.com/file/d/1_RazxXIi_Zx0szf7ndJ1drZT-iL2zum1/view?usp=sharing, and all the associated KS files can be found here: https://drive.google.com/drive/folders/1JsMntEZVkOgcIqvT7dMnDyASxnYsyFOA?usp=sharing
  3. We have tried to do some debugging of the issue ourselves. It seems that there are some templates where all the channels are completely nan over all time, causing the assertion error in phy in _find_best_channels. In particular, this means that max_amplitude will be nan, which causes obvious problems. We've tried to hack around this a little, but keep running into other problems - we think this is a kilosort issue, as it's the saved output of KS2.5 (in particular templates.npy, although we haven't checked exhaustively that more files don't contain nans) that contains a bunch of nans. To reiterate, there are some templates - even templates that are referenced in spike_templates - that exclusively contain nans! This seems to create an issue for Phy, not sure if it's a change in this version of KS.
torbenott commented 3 years ago

We looked into the KS2.5 code and found that in trackAndSort (called by runTemplates which is called by learnAndSolve8b) the number of spikes per template nsp is 0 for many templates. This will create NaNs in rez.dWU in line 226 (dividing 0 by 0!), which later will be inherited to W and therefore to templates.

achristensen56 commented 3 years ago

just wanted to link this issue: https://github.com/cortex-lab/phy/issues/1060 -- I believe it's the same problem as we (@torbenott & I) see with our data.

rossant commented 3 years ago

https://github.com/cortex-lab/phy/issues/1051#issuecomment-756023377

brykko commented 3 years ago

I tried to locate the origin of this problem and independently came to the same conclusion as @torbenott. I found a simple workaround which solved the Phy problem for me –

Here's line 226 from "trackAndSort.m":

rez.dWU = dWU1 ./ reshape(nsp, [1,1,Nfilt]);

I replaced it with the version below, which only performs the division for templates with at least one spike:

valid = nsp > 0;
rez.dWU(:, :, valid) = dWU1(:, :, valid) ./ reshape(nsp(valid), [1,1,sum(valid)]);
Hankslab-ABG commented 3 years ago

@brykko

This seemed to work for me as well, though I'm a bit puzzled as it yielded rather different results when I compared KS2.5 with this workaround to KS2.5 as is with the same dataset. Have you had a similar experience?

Hankslab-ABG commented 3 years ago

Actually never mind on that front. I had my test dataset mislabeled as KS2.5 when really it was KS2. Your fix seems to produce identical results, at least in that dataset.

marius10p commented 3 years ago

Thanks for working this out everyone. In principle there shouldn't be any spikes with nsp = 0, but I guess it sometimes happens. I pushed a fixed adding a small constant to the normalization in line 236 of trackAndSort. Reopen this issue if there are still problems.

Hankslab-ABG commented 3 years ago

Sorry to comment on a closed issue, but the fix @marius10p implemented isn't working for me using KS 2.5. Oddly, that fix results in a CUDA error, but changing it as follows seems to work fine:

`% rez.dWU = dWU1 ./ (1e-10 + single(reshape(nsp, [1,1,Nfilt])));

rez.dWU = dWU1 ./ (1e-10 + reshape(nsp, [1,1,Nfilt]));`

So apparently making this not single precision alleviates the CUDA error. Any idea why this might be the case? The error is a CUDA illegal address in reference to the following line from splitAllClusters.m (line 189)

[WtW, iList] = getMeWtW(single(rez.W), single(rez.U), Nnearest); % we re-compute similarity scores between templates