SpikeInterface / spikeinterface

A Python-based module for creating flexible and robust spike sorting pipelines.
https://spikeinterface.readthedocs.io
MIT License
517 stars 184 forks source link

questions and phy export error #104

Closed paul-aparicio closed 3 years ago

paul-aparicio commented 3 years ago

Hello, I have been learning how to use spike interface from the tutorials, and I have a few questions. I want to import data from blackrock and save as nwb. (1) When I open the saved NWB file, I see that some of the metadata is left blank or with defaults. Is there a way I can go in, either when I am writing the file, or after its been written, and rewrite the values for the metadata in the various objects that were imported? For example, I would like for the device, electrode group, and subject information metadata to be put in the file, how can I specify this? (2) In the tutorial there is a line that references a metadata field in recordingExtractor.write_recording. I cant seem to find any documentation on this, could you please elaborate on how this is used? For example, can I use this to add all the metadata that would be entered in the file if I called pynwb.NWBFile? What exactly is the format for the metadata argument supposed to be? I tried the examples, and they don't seem to work for me?

Finally, I have an error that occurs when I try to export a sort to phy for manual curation. I run this command (which basically comes from the tutorial, but where I have added my own data instead of the sample):

st.postprocessing.export_to_phy(recording_cache, agreement_sorting, output_folder='phy_AGR', grouping_property='group', verbose=True, recompute_info=True)

It produces this output:

Converting to Phy format Disabling 'max_channels_per_template'. Channels are extracted using 'grouping_property' Number of chunks: 9 - Number of jobs: 1

Extracting waveforms in chunks: 100%|##########| 9/9 [00:26<00:00, 2.95s/it]

Fitting PCA of 3 dimensions on 219639 waveforms Projecting waveforms on PC Saving files Saved phy format to: /media/paul/storage/Python/phy_AGR Run:

phy template-gui /media/paul/storage/Python/phy_AGR/params.py

But when I try to run phy, I get the following error pasted below. I verified that my installation of phy is working by opening a dataset exported directly from kilosort3. This opens without error as expected. I am running Linux Ubuntu 18.04

An error has occurred (AssertionError): Traceback (most recent call last): File "/home/paul/anaconda3/envs/phy2/bin/phy", line 8, in sys.exit(phycli()) File "/home/paul/anaconda3/envs/phy2/lib/python3.7/site-packages/click/core.py", line 1025, in call return self.main(args, kwargs) File "/home/paul/anaconda3/envs/phy2/lib/python3.7/site-packages/click/core.py", line 955, in main rv = self.invoke(ctx) File "/home/paul/anaconda3/envs/phy2/lib/python3.7/site-packages/click/core.py", line 1517, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/paul/anaconda3/envs/phy2/lib/python3.7/site-packages/click/core.py", line 1279, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/paul/anaconda3/envs/phy2/lib/python3.7/site-packages/click/core.py", line 710, in invoke return callback(args, *kwargs) File "/home/paul/anaconda3/envs/phy2/lib/python3.7/site-packages/click/decorators.py", line 18, in new_func return f(get_current_context(), args, kwargs) File "/home/paul/anaconda3/envs/phy2/lib/python3.7/site-packages/phy/apps/init.py", line 159, in cli_template_gui template_gui(params_path, kwargs) File "/home/paul/anaconda3/envs/phy2/lib/python3.7/site-packages/phy/apps/template/gui.py", line 198, in template_gui controller = TemplateController(model=load_model(params_path), dir_path=dir_path, kwargs) File "/home/paul/anaconda3/envs/phy2/lib/python3.7/site-packages/phylib/io/model.py", line 1178, in load_model return TemplateModel(get_template_params(params_path)) File "/home/paul/anaconda3/envs/phy2/lib/python3.7/site-packages/phylib/io/model.py", line 297, in init self._load_data() File "/home/paul/anaconda3/envs/phy2/lib/python3.7/site-packages/phylib/io/model.py", line 356, in _load_data self.sparse_templates = self._load_templates() File "/home/paul/anaconda3/envs/phy2/lib/python3.7/site-packages/phylib/io/model.py", line 652, in _load_templates assert cols.shape == (n_templates, n_channels_loc) AssertionError

Thank you for any help!

alejoe91 commented 3 years ago

Hi @paul-aparicio

You can include metadata by using the metadata argument. If you run se.NwbRecordingExtractor.write_recording? it explains how to format the metadata dictionary.

Regarding the phy error, have you removed any channels before processing? which versions of spikeextractors and spiketoolkit are you using?

paul-aparicio commented 3 years ago

Hi and thanks for your help. I did see the se.NwbRecordingExtractor.write_recording? information, but I am still unclear how to format the metadata. For example, when usingpynwb.add_electrode, there is metadata associated with the dynamic table region, where is that placed? If I also want to add session, subject, keywords, etc. metadata, would I do the following?

'metadata['Ecephys'] = {} metadata['nwbfile'] = {}'

I tried the following commands

'metadata['Ecephys'] = {} metadata['Ecephys']['Device'] = [{ 'name' : 'Utah Array 2', 'description' : 'Tony RH 10x10 array', 'manufacturer' : 'BlackRock Microsystems' }] metadata['Ecephys']['ElectrodeGroup'] = [{ 'name' : '96 Channel Utah array', 'description' : '10x10 flat array', 'location' : 'M1', 'device' : 'Utah Array 2', 'position' : ['AP+9.8', 'ML+4.5'] }]

MAPX = np.array([[0, 96, 95, 94, 93, 92, 91, 90, 89, 0], [88, 87, 86, 85, 84, 83, 82, 81, 80, 79], [78, 77, 76, 75, 74, 73, 72, 71, 70, 69], [68, 67, 66, 65, 64, 63, 62, 61, 60, 59], [58, 57, 56, 55, 54, 53, 52, 51, 50, 49], [48, 47, 46, 45, 44, 43, 42, 41, 40, 39], [38, 37, 36, 35, 34, 33, 32, 31, 30, 29], [28, 27, 26, 25, 24, 23, 22, 21, 20, 19], [18, 17, 16, 15, 14, 13, 12, 11, 10, 9], [0, 8, 7, 6, 5, 4, 3, 2, 1, 0]]) MAP = MAPX.transpose() electrode_spacing_um = 400

data = [] for idx in range(1,96): data.append( { 'id' : idx, 'x' : float(electrode_spacing_umnp.where(MAP==idx)[1]), 'y' : float(electrode_spacing_umnp.where(MAP==idx)[0]), 'z' : float(0), 'imp' : float(-idx), 'location' : 'M1', 'filtering' : 'none', 'group' : '96 Channel Utah array' } ) metadata['Ecephys']['Electrodes'] = [{ 'name' : 'Utah array', 'description' : '96 channel tony M1 array', 'data' : data }] metadata['Ecephys']['ElectricalSeries'] = {'name' : 'utah array', 'description' : 'continuous voltage'}

print(type(metadata['Ecephys']['Device'])) <class 'list'> print(type(metadata['Ecephys']['Device'][0])) <class 'dict'> se.NwbRecordingExtractor.write_recording(recording_cache, 'si_tutorial2.nwb', metadata=metadata) se.NwbSortingExtractor.write_sorting(sorting_IC, 'si_tutorial2.nwb', metadata=metadata)'


AssertionError Traceback (most recent call last)

in ----> 1 se.NwbRecordingExtractor.write_recording(recording_cache, 'si_tutorial2.nwb', metadata=metadata) 2 se.NwbSortingExtractor.write_sorting(sorting_IC, 'si_tutorial2.nwb', metadata=metadata) ~/anaconda3/envs/spikes3/lib/python3.9/site-packages/spikeextractors-0.9.6-py3.9.egg/spikeextractors/extractors/nwbextractors/nwbextractors.py in write_recording(recording, save_path, overwrite, nwbfile, buffer_mb, use_times, metadata, write_as_lfp, write_scaled) 1032 nwbfile = NWBFile(**nwbfile_kwargs) 1033 -> 1034 se.NwbRecordingExtractor.add_all_to_nwbfile( 1035 recording=recording, 1036 nwbfile=nwbfile, ~/anaconda3/envs/spikes3/lib/python3.9/site-packages/spikeextractors-0.9.6-py3.9.egg/spikeextractors/extractors/nwbextractors/nwbextractors.py in add_all_to_nwbfile(recording, nwbfile, buffer_mb, use_times, metadata, write_as_lfp, write_scaled) 910 ) 911 --> 912 se.NwbRecordingExtractor.add_electrode_groups( 913 recording=recording, 914 nwbfile=nwbfile, ~/anaconda3/envs/spikes3/lib/python3.9/site-packages/spikeextractors-0.9.6-py3.9.egg/spikeextractors/extractors/nwbextractors/nwbextractors.py in add_electrode_groups(recording, nwbfile, metadata) 440 ) 441 ) --> 442 se.NwbRecordingExtractor.add_devices(recording, nwbfile, metadata=new_device) 443 warnings.warn(f"Device \'{device_name}\' not detected in " 444 "attempted link to electrode group! Automatically generating.") ~/anaconda3/envs/spikes3/lib/python3.9/site-packages/spikeextractors-0.9.6-py3.9.egg/spikeextractors/extractors/nwbextractors/nwbextractors.py in add_devices(recording, nwbfile, metadata) 376 ) 377 ) --> 378 assert all([isinstance(x, dict) for x in metadata['Ecephys']['Device']]), \ 379 "Expected metadata['Ecephys']['Device'] to be a list of dictionaries!" 380 AssertionError: Expected metadata['Ecephys']['Device'] to be a list of dictionaries! For the error with phy, I did remove some channels using the following command (similar to the tutorial): 'recording_rm_noise = st.preprocessing.remove_bad_channels(recording_f, bad_channel_ids=[34, 50, 52, 67, 81])'. I am using spikeextractors 0.9.6 and spikeinterface 0.12.0 (I installed spikeinterface by downloading the github repository and ran 'python setup.py install') Thank you again for your help.
paul-aparicio commented 3 years ago

Hello, Following up on the export to phy issue. I found that during exporting, a temp folder is generated that has the waveform files in it. This folder becomes quite large filling my hard drive (~10X the size of the data, so from 32gb datafile > 410gb). Is there a way to change the location of this tmp folder? Or is there a setting to use that might help?

Thanks

alejoe91 commented 3 years ago

Hi,

Yes you can set the tmp folder of the SortingExtractor object with:

sorting.set_tmp_folder('path-to-folder')

Note that the new folder needs to exist already!

paul-aparicio commented 3 years ago

Thanks, I can now direct the output to this tmp folder, but it appears that the folder size is going to end up being ~5TB? Can I delete things in the tmp folder before the export completes? Do you have any suggestions on how to manage this so that its not so large? I am using the ironclust sorter with a datafile that is ~33gb, and the sorter found 493 units.

alejoe91 commented 3 years ago

Hi @paul-aparicio

There might be something off then. There could be a reasons. We recently updated spiketoolkit to handle better the files needed to export to Phy. Please make sure you are using the latest version. Another possible issue is that ironclust outputs some noisy units with an excessive firing rate. If you have for example some units with 1000Hz average rate (likely noise) the program will try to extract all waveforms and his can result in extremely large files.

Can you check that? You can do:

for u in sorting.get_unit_ids():
    print(len(sorting.get_unit_spike_train(u)))

If that's the case, you can filter out those unit with: sorting_clean = st.curation.threshold_firing_rates(sorting, threshold=200, threshold_sign='greater', duration_in_frames=recording.get_num_frames())

This will remove automatically all units above 200Hz.

Let me know if this is the case

paul-aparicio commented 3 years ago

Hello, I used the above code and found no units with firing rates > 200Hz. I also tried a different sorter (tridesclous) and it returned 178 units, none of which had firing rates >200Hz. When I export to phy with the 178 units the tmp folder has gone to ~2.8TB and has been running for about 2 days.

samuelgarcia commented 3 years ago

@paul-aparicio : note that I have implemented a new export_to_phy() of this for the next release (0.90.0).

The new implementation should be faster.

In this futur versionn, note that:

Nvertheless, maybe you could try this new version. Please make feedback on speed. This new version should be faster here because PC are not handle yet. For this you need to install from sources (master branch)

Have a look to this https://spikeinterface.readthedocs.io/en/latest/modules/toolkit/plot_2_postprocessing.html#export-sorted-data-to-phy-for-manual-curation

alejoe91 commented 3 years ago

@samuelgarcia yes but still there is something wrong here...what's the spiketoolkit version you are using?

samuelgarcia commented 3 years ago

The new one!! What is wrong ?

alejoe91 commented 3 years ago

no I mean for the old one. How can it generate 5Tb?

alejoe91 commented 3 years ago

@paul-aparicio can you check the content of the tmp folder? You should have one file for each unit with waveforms, and a file called waveforms_pca_fit.raw. How large are those?

alejoe91 commented 3 years ago

Also, could you count all spikes in your data?

n_spikes = 0
for u in sorting.get_unit_ids():
    n_spikes += len(sorting.get_unit_spike_train(u))
print(n_spikes)

What probe are you using?

alejoe91 commented 3 years ago

If you have an enormous number of spikes, an option can be to exoprt to phy without PC scores and amplitudes. You would still be able to visualize templates/waveforms, all spike train metrics (auto and cross correlograms, ISI distr and so on).

You can do it by setting the compute_pc_features=False and compute_amplitudes=False.

paul-aparicio commented 3 years ago

Hi, thanks for the replies. I am using spike tool kit version 0.7.5. The tmp folder has the following files: waveforms.raw (with file sizes that range between 2-50gb) pcascores.raw (50-800mb each) and amplitude.raw (300kb - 1.2mb), one for each unit (178 with the tdc sorted data). The waveforms_pca_fit.raw file was 57.2 gb (TDC sort).

There were 39352814 spikes in the TDC sort. The probe is a 96 channel utah array. Thanks for the tip about setting pc features and amplitudes to false, I will give it a try. Also, happy to try out the new code and get you some feedback.

alejoe91 commented 3 years ago

Hi @paul-aparicio

Since this is a Utah array, are you spike sorting by group? Electrodes are far apart from each other, so you wouldntm't expect waveforms to appear on multiple sites. If you do so, then you can export to Phy by grouping_property='group'. This would effectively export waveforms on a single channel per unit, instead of 96 (which in this case doesn't really make sense!). This should reduce the size by roughly x100 and should make it manageable. Can you try this as well?

paul-aparicio commented 3 years ago

Thanks, I do sort by group and I have been exporting with grouping_property='group' though in the tmp folder I see those files per unit?

alejoe91 commented 3 years ago

Ok, then the waveforms should already be mono-channels..this means that for some units you have over 50million spikes?? (50Gb / 120 samples / 8 bytes)

Another think g that could reduce the size is the reduce the cut_out length. By default, SI cuts 2 ms and 3 after (if I remember correctly). You can reduce this by passing the ms_before=0.5 and ms_after=1, which should be enough for visualization and PC projections.

paul-aparicio commented 3 years ago

The data is sampled at 30khz. So pre and post sample times of 2 and 3ms would be about 15000 samples per waveform, maybe that is the problem. I will try reducing these values and let you know how it goes. Thanks!

alejoe91 commented 3 years ago

It would be 150 samples: 0.005 s * 30 000 Hz = 150 ;)

paul-aparicio commented 3 years ago

Op! Yes, that's correct. So maybe the waveform.raw files are off somehow? If I open one, I should expect to see 150 X #spikes matrix?

alejoe91 commented 3 years ago

Yes this should be num_spikes x 1 channel x 150 samples. However these are binary files, so to open them back you need to know the shape beforehand. To test it you can call get_unit_waveforms in a subset of units (to extract all waveforms you can use `max_spikes_per_unit=None) and check the shape of the output variables, which are memmap objects.

paul-aparicio commented 3 years ago

Hi, I'm not sure, but it appears that the waveform.raw files do not have this organization? When I run 'tst = st.postprocessing.get_unit_waveforms(recording, sorting_TDC, unit_ids=[167], max_spikes_per_unit=None)' it has a shape of (1, 79851, 91, 180). There were 79851 spikes in this cluster and the waveform.raw file is 5.2gb? so each element is 4bytes? and the number of probes (91) is being factored in?

Not sure if this is related, but when export to phy completed, I tried to open it in Phy and it gave an assertion error: 'phylib/io/model.py", line 652, in _load_templates assert cols.shape == (n_templates, n_channels_loc) AssertionError'

I found template_ind.npy contained a 178X1 vector of values?

alejoe91 commented 3 years ago

Hi @paul-aparicio

You have to force computation by group by adding grouping_property='group' and recompute_info=True. Also when you export to Phy make sure you add the same arguments

paul-aparicio commented 3 years ago

Hi, I have found that when I call st.postprocessing.get_unit_waveforms in order to force computation by group, it initially creates waveform.raw files with all of the channels (so the tmp folder size increases to TB's of data), then after its done the waveform files in the tmp folders are the correct smaller size (MBs). The same thing happens (it recalculates the waveform files to the larger size) when I run export_to_phy. It also saves the waveforms with the larger size to the sorting_cache after export_to_phy; when I look at the size of get_unit_waveforms it includes all of the channels? (though, it also crashes before it completes the export_to_phy). I must have some setting wrong? Below is the code I am using, thank you again for your help:

st.postprocessing.get_unit_waveforms(recording_cache, sorting_cache, 
                                     grouping_property='group', 
                                     ms_before=0.5, 
                                     ms_after=1, 
                                     compute_property_from_recording=True, 
                                     n_jobs=10, 
                                     max_spikes_per_unit=None, 
                                     memmap=True, 
                                     save_property_or_features=True, 
                                     recompute_info=True, 
                                     verbose=True)
tst = st.postprocessing.get_unit_waveforms(recording_cache, sorting_cache, unit_ids=[0], max_spikes_per_unit=None)
np.shape(tst)

(1, 284221, 1, 45)

st.postprocessing.export_to_phy(recording_cache,
                                sorting_cache,
                                output_folder='/media/paul/storage/Data/TY20210211_freeAndMoths/phy_TDC2',
                                grouping_property='group', 
                                verbose=True, 
                                ms_before=0.5, 
                                ms_after=1,
                                memmap=True, 
                                recompute_info=True,
                                n_jobs=1)
tst = st.postprocessing.get_unit_waveforms(recording_cache, sorting_cache, unit_ids=[0], max_spikes_per_unit=None)
np.shape(tst)

(1, 284221, 91, 45)

alejoe91 commented 3 years ago

Hi @paul-aparicio

It shouldn't save the waveforms raw with all channels, but onlu the ones by group...maybe there's a bug on our side

Can you paste the output of:

print(recording_cache.get_channel_groups())

for u in sorting_cache.get_unit_ids():
    print(sorting_cache.get_unit_property(u, "group"))
paul-aparicio commented 3 years ago

hi, The output for print(recording_cache.get_channel_groups())

[ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 51 53 54 55 56 57 58 59 60 61 62 63 64 65 66 68 69 70 71 72 73 74 75 76 77 78 79 80 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96]

(I removed channels 34, 50, 52, 67, and 81) but running sorting_cache.get_unit_property(unit_id=0, property_name='group') produces an error ValueError: group has not been added to unit 0

this was true for all of the channels and looking at the properties shows: sorting_cache.get_unit_property_names(unit_id=0)

['waveforms_channel_idxs']

It appears that the group property is not on the sorter? When I ran sorter, I used the following command:

sorting_TDC = ss.run_tridesclous(recording_cache, 
                                 output_folder='/media/paul/storage/Data/TY20210211_freeAndMoths/results_tdc', 
                                 grouping_property='group', 
                                 parallel=True, 
                                 n_jobs=5, 
                                 verbose=True)
print(f'Tridesclous found {len(sorting_TDC.get_unit_ids())} units')
alejoe91 commented 3 years ago

I think that is the problem! However when running the sorting with grouping property you should get the 'group' property. Maybe it got lost when you caches the sorting (and we should fix it). Can you check if the 'group' is available in sortong_TDC? In that case, try to run the export to Phy using that object

paul-aparicio commented 3 years ago

Hi @alejoe91

the sorting_TDC comes back with no properties:

In [12]: sorting.get_unit_property_names(unit_id=1) Out[12]: []

alejoe91 commented 3 years ago

That looks like sorting, not sorting_TDC

paul-aparicio commented 3 years ago

Ah, I had loaded the sorter from a pkl file and named it sorter instead of sorter_TDC, but just in case, I went ahead and reran the analysis from scratch to make sure. I found that the group property is present before caching but not after:

print(sorting_TDC.get_unit_property_names(unit_id=1))
print(sorting_cache.get_unit_property_names(unit_id=1))

['group'] []

Also dumping sorting_TDC to pkl also removed the group property

sorting_TDC.dump_to_pickle('/media/paul/storage/Data/TY20210211_freeAndMoths/tdc_sorting.pkl')
sorting = se.load_extractor_from_pickle('/media/paul/storage/Data/TY20210211_freeAndMoths/tdc_sorting.pkl')
sorting.get_unit_property_names(unit_id=1)

[]

I am running the waveform extraction and then the export_to_phy with sorting_TDC now, so I will let you know if it works. With the waveform.raw extraction, is there a way to have it only make the smaller 'group' waveform files instead of the all channels waveform.raw files (which it then converts to the smaller size)? Or is reducing the ms_before and after the best solution to the tmp folder size? Thanks for the help!

alejoe91 commented 3 years ago

In the meanwhile I'll fix the missing properties when caching/dumping ;)

paul-aparicio commented 3 years ago

Hi The processes completed. I first ran the extract waveforms to force computation by group:

st.postprocessing.get_unit_waveforms(recording_cache, sorting_TDC,  
                                     grouping_property='group', 
                                     ms_before=0.5, 
                                     ms_after=1, 
                                     compute_property_from_recording=True, 
                                     n_jobs=10, 
                                     max_spikes_per_unit=None, 
                                     memmap=True, 
                                     save_property_or_features=True, 
                                     recompute_info=True, 
                                     verbose=True)

testing the output showed everything seemed to work correctly

tst = st.postprocessing.get_unit_waveforms(recording_cache, sorting_TDC, unit_ids=[0], max_spikes_per_unit=None)
print(np.shape(tst))
print(recording_cache.get_channel_groups())
print(sorting_TDC.get_unit_property_names(unit_id=0))
print(sorting_TDC.get_unit_property(unit_id=0, property_name='group'))

(1, 281860, 1, 45) [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 51 53 54 55 56 57 58 59 60 61 62 63 64 65 66 68 69 70 71 72 73 74 75 76 77 78 79 80 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96] ['group', 'waveforms_channel_idxs'] 1

When I ran export_to_phy it crashed. Checking the tmp directory after the crash, I found the amplitudes, pcascores, larger waveform files, and the waveforms_pca_fit files were present. Checking the variables again after the crash showed the larger waveform files but the group property was still on the sorter:

tst = st.postprocessing.get_unit_waveforms(recording_cache, sorting_TDC, unit_ids=[0], max_spikes_per_unit=None)
print(np.shape(tst))
print(recording_cache.get_channel_groups())
print(sorting_TDC.get_unit_property_names(unit_id=0))
print(sorting_TDC.get_unit_property(unit_id=0, property_name='group'))

(1, 281860, 91, 45) [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 51 53 54 55 56 57 58 59 60 61 62 63 64 65 66 68 69 70 71 72 73 74 75 76 77 78 79 80 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96] ['group', 'waveforms_channel_idxs'] 1

The crash is below:

Converting to Phy format
Disabling 'max_channels_per_template'. Channels are extracted using 'grouping_property'
Number of chunks: 616 - Number of jobs: 5
Fitting PCA of 3 dimensions on 859333 waveforms
Projecting waveforms on PC

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-22-e637d2dad070> in <module>
----> 1 st.postprocessing.export_to_phy(recording_cache,
      2                                 sorting_TDC,
      3                                 output_folder='/media/paul/storage/Data/SignalCheck/TY20210308_0705_signalCheck_morning/phy_TDC',
      4                                 grouping_property='group',
      5                                 recompute_info=True,

~/anaconda3/envs/spikes3/lib/python3.9/site-packages/spiketoolkit-0.7.5-py3.9.egg/spiketoolkit/postprocessing/postprocessing_tools.py in export_to_phy(recording, sorting, output_folder, compute_pc_features, compute_amplitudes, max_channels_per_template, copy_binary, **kwargs)
   1371     spike_times, spike_clusters, amplitudes, channel_map, pc_features, pc_feature_ind, \
   1372     spike_templates, templates, templates_ind, similar_templates, channel_map_si, channel_groups, \
-> 1373     positions = _get_phy_data(recording, sorting, compute_pc_features, compute_amplitudes,
   1374                               max_channels_per_template, **kwargs)
   1375 

~/anaconda3/envs/spikes3/lib/python3.9/site-packages/spiketoolkit-0.7.5-py3.9.egg/spiketoolkit/postprocessing/postprocessing_tools.py in _get_phy_data(recording, sorting, compute_pc_features, compute_amplitudes, max_channels_per_template, **kwargs)
   1886     spike_times, spike_times_amps, spike_times_pca, spike_clusters, spike_clusters_amps, spike_clusters_pca, \
   1887     amplitudes, pc_features, pc_feature_ind, templates, channel_index_list \
-> 1888         = _get_quality_metric_data(recording, sorting, n_comp=n_comp, ms_before=ms_before, ms_after=ms_after,
   1889                                    dtype=dtype, amp_method=amp_method, amp_peak=amp_peak,
   1890                                    amp_frames_before=amp_frames_before,

~/anaconda3/envs/spikes3/lib/python3.9/site-packages/spiketoolkit-0.7.5-py3.9.egg/spiketoolkit/postprocessing/postprocessing_tools.py in _get_quality_metric_data(recording, sorting, n_comp, ms_before, ms_after, dtype, amp_method, amp_peak, amp_frames_before, amp_frames_after, max_spikes_per_unit, max_spikes_for_amplitudes, max_spikes_for_pca, recompute_info, max_channels_per_waveforms, save_property_or_features, n_jobs, joblib_backend, verbose, seed, memmap, compute_pc_features, compute_amplitudes)
   1705             sorting.clear_units_spike_features(feature_name='amplitudes')
   1706 
-> 1707         amplitudes_list, amp_idxs = get_unit_amplitudes(recording, sorting, method=amp_method,
   1708                                                         save_property_or_features=save_property_or_features,
   1709                                                         peak=amp_peak, max_spikes_per_unit=max_spikes_for_amplitudes,

~/anaconda3/envs/spikes3/lib/python3.9/site-packages/spiketoolkit-0.7.5-py3.9.egg/spiketoolkit/postprocessing/postprocessing_tools.py in get_unit_amplitudes(recording, sorting, unit_ids, channel_ids, return_idxs, **kwargs)
    736             wf_cut = wf[:, max_channels[i], center_frame - frames_before:center_frame + frames_after]
    737             if peak == 'both':
--> 738                 amps = np.max(np.abs(wf_cut), axis=-1)
    739                 if len(amps.shape) > 1:
    740                     amps = np.max(wf)

<__array_function__ internals> in amax(*args, **kwargs)

~/anaconda3/envs/spikes3/lib/python3.9/site-packages/numpy/core/fromnumeric.py in amax(a, axis, out, keepdims, initial, where)
   2731     5
   2732     """
-> 2733     return _wrapreduction(a, np.maximum, 'max', axis, None, out,
   2734                           keepdims=keepdims, initial=initial, where=where)
   2735 

~/anaconda3/envs/spikes3/lib/python3.9/site-packages/numpy/core/fromnumeric.py in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs)
     85                 return reduction(axis=axis, out=out, **passkwargs)
     86 
---> 87     return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
     88 
     89 

ValueError: zero-size array to reduction operation maximum which has no identity

Maybe my settings for export_to_phy are incorrect? Here is the command I am using:

st.postprocessing.export_to_phy(recording_cache,
                                sorting_TDC,
                                output_folder='/media/paul/storage/Data/SignalCheck/TY20210308_0705_signalCheck_morning/phy_TDC',
                                grouping_property='group',
                                recompute_info=True, 
                                verbose=True, 
                                ms_before=0.5, 
                                ms_after=1, 
                                memmap=True, 
                                compute_property_from_recording=True,
                                save_property_or_features=True,
                                n_jobs=5)
alejoe91 commented 3 years ago

@paul-aparicio I think that this could be because TDC might output a few clusters/units with no spikes, e.g. the ones used for template matching which didn't find any match. You can remove them with:

sortong_TDC_clean = st.curation.threshold_num_spikes(sorting_TDC, threshold=2, threshold_sign="less")

As for the missing properties when you cache the sorting, how do you save it? The CacheSortingExtractor only saves the file to npz and copies the properties to the new object(https://github.com/SpikeInterface/spikeextractors/blob/master/spikeextractors/cacheextractors.py#L136-L137).

To save propertis to file as well, you can use: sorting_cache.dump_to_pickle("my-file.pkl", include_properties=True). This actively saves the property dictionary so when you load it back with se.load_extractor_from_pickle you can retrieve the properties back. Hope this makes it clearer :)

paul-aparicio commented 3 years ago

Hi @alejoe91,

Thanks again for your help, I know you are probably very busy, so I appreciate your responses. I tried using the thresholding you suggested, but it did not remove any units and (of course) the crash was the same. I also tried using a different data set, but this also produced exactly the same crash (from previous post). I tried to workaround this by calculating the waveforms, amplitudes and PCA scores separately, before using export_to_phy. When the export crashed I looked at the source code (I am a novice with python) and found that in postprocessing_tools.py the line 1942 templates_red[u_i, :] = templates[u_i, :, unit_chans].T seemed to be throwing an error because unit_chans increments with the channels but templates has the shape (Nspikes, Ntimepoints, Nchannelspergroup), and in my case there is always just one channel per group. So once unit_chans gets a value other than 0 it crashes. Does that sound correct? I used the following command for export_to_phy:

st.postprocessing.export_to_phy(recording_cache,
                                sorting_TDC,
                                output_folder='/media/paul/storage/Data/SignalCheck/TY20210308_0705_signalCheck_morning/phy_TDC',
                                compute_pc_features=False,
                                compute_amplitudes=False,
                                max_channels_per_template=1,
                                grouping_property='group',
                                max_spikes_per_unit=None,
                                compute_property_from_recording=False,
                                n_jobs=1,
                                ms_before=0.5,
                                ms_after=1,
                                frames_before=3,
                                frames_after=3,
                                recompute_info=False,
                                save_property_or_features=False,
                                verbose=True,
                                memmap=True)

For the caching issues, To make sure I understand, your saying that the CacheSortingExtractor wont ever save the properties from the sorting, it only transfers them to the new cache object that is created? So to get the properties back I need to specify save properties in dump_to_pickle?

So I ran the following test:

# save output of sorting
sorting_cache = se.CacheSortingExtractor(sorting_TDC, '/media/paul/storage/Data/SignalCheck/TY20210308_0705_signalCheck_morning/tdc_sort_results_cache.dat')
sorting_TDC.dump_to_pickle('/media/paul/storage/Data/SignalCheck/TY20210308_0705_signalCheck_morning/tdc_sorting_cache.pkl', include_properties=True)

# load from pkl
sorting_TDC_cache = se.load_extractor_from_pickle('/media/paul/storage/Data/SignalCheck/TY20210308_0705_signalCheck_morning/tdc_sorting_cache.pkl')
# checks for properties on sorting
print(sorting_TDC.get_unit_property_names(unit_id=1))
print(sorting_cache.get_unit_property_names(unit_id=1))
print(sorting_TDC_cache.get_unit_property_names(unit_id=1))

['group'] [] []

It seems that it did not transfer the properties to the new sorting object and adding , include_properties=True did not save the properties in the pkl dump?