LorenFrankLab / spyglass

Neuroscience data analysis framework for reproducible research built by Loren Frank Lab at UCSF
https://lorenfranklab.github.io/spyglass/
MIT License
94 stars 43 forks source link

Metric Curation Populate inhomogeneous shape for probe data #921

Closed rpswenson closed 6 months ago

rpswenson commented 7 months ago

Got the whole spikesorting loop to run properly for tetrodes (thanks yall) but when running the same code for a probe I run into this error at sgs.MetricCuration.populate(m_key):

Cell In[15], line 43
     41 sgs.MetricCurationSelection.insert_selection(key)
     42 m_key = (sgs.MetricCurationSelection() & key).fetch1()
---> 43 sgs.MetricCuration.populate(m_key)
     44 #sgs.MetricCuration.populate(key)
     46 key = {
     47     "metric_curation_id": (
     48         sgs.MetricCurationSelection & {"sorting_id": key["sorting_id"]} & {"metric_param_name": key["metric_param_name"]}
     49     ).fetch1("metric_curation_id")
     50 }

File ~/datajoint-python/datajoint/autopopulate.py:247, in AutoPopulate.populate(self, suppress_errors, return_exception_objects, reserve_jobs, order, limit, max_calls, display_progress, processes, make_kwargs, *restrictions)
    241 if processes == 1:
    242     for key in (
    243         tqdm(keys, desc=self.__class__.__name__)
    244         if display_progress
    245         else keys
    246     ):
--> 247         status = self._populate1(key, jobs, **populate_kwargs)
    248         if status is True:
    249             success_list.append(1)

File ~/datajoint-python/datajoint/autopopulate.py:314, in AutoPopulate._populate1(self, key, jobs, suppress_errors, return_exception_objects, make_kwargs)
    312 self.__class__._allow_insert = True
    313 try:
--> 314     make(dict(key), **(make_kwargs or {}))
    315 except (KeyboardInterrupt, SystemExit, Exception) as error:
    316     try:

File ~/spyglass/src/spyglass/spikesorting/v1/metric_curation.py:268, in MetricCuration.make(self, key)
    262 merge_groups = self._compute_merge_groups(metrics, merge_params)
    264 logger.info("Saving to NWB...")
    265 (
    266     key["analysis_file_name"],
    267     key["object_id"],
--> 268 ) = _write_metric_curation_to_nwb(
    269     nwb_file_name, waveforms, metrics, labels, merge_groups
    270 )
    272 # INSERT
    273 AnalysisNwbfile().add(
    274     nwb_file_name,
    275     key["analysis_file_name"],
    276 )

File ~/spyglass/src/spyglass/spikesorting/v1/metric_curation.py:586, in _write_metric_curation_to_nwb(nwb_file_name, waveforms, metrics, labels, merge_groups)
    579             nwbf.add_unit_column(
    580                 name=metric,
    581                 description=metric,
    582                 data=metric_values,
    583             )
    585     units_object_id = nwbf.units.object_id
--> 586     io.write(nwbf)
    587 return analysis_nwb_file, units_object_id

File ~/miniforge3/envs/spyglass/lib/python3.9/site-packages/hdmf/utils.py:664, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    662 def func_call(*args, **kwargs):
    663     pargs = _check_args(args, kwargs)
--> 664     return func(args[0], **pargs)

File ~/miniforge3/envs/spyglass/lib/python3.9/site-packages/hdmf/backends/hdf5/h5tools.py:375, in HDF5IO.write(self, **kwargs)
    370     raise UnsupportedOperation(("Cannot write to file %s in mode '%s'. "
    371                                 "Please use mode 'r+', 'w', 'w-', 'x', or 'a'")
    372                                % (self.source, self.__mode))
    374 cache_spec = popargs('cache_spec', kwargs)
--> 375 super().write(**kwargs)
    376 if cache_spec:
    377     self.__cache_spec()

File ~/miniforge3/envs/spyglass/lib/python3.9/site-packages/hdmf/utils.py:664, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    662 def func_call(*args, **kwargs):
    663     pargs = _check_args(args, kwargs)
--> 664     return func(args[0], **pargs)

File ~/miniforge3/envs/spyglass/lib/python3.9/site-packages/hdmf/backends/io.py:99, in HDMFIO.write(self, **kwargs)
     97 """Write a container to the IO source."""
     98 f_builder = self.__manager.build(container, source=self.__source, root=True)
---> 99 self.write_builder(f_builder, **kwargs)

File ~/miniforge3/envs/spyglass/lib/python3.9/site-packages/hdmf/utils.py:664, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    662 def func_call(*args, **kwargs):
    663     pargs = _check_args(args, kwargs)
--> 664     return func(args[0], **pargs)

File ~/miniforge3/envs/spyglass/lib/python3.9/site-packages/hdmf/backends/hdf5/h5tools.py:812, in HDF5IO.write_builder(self, **kwargs)
    809 self.logger.debug("Writing GroupBuilder '%s' to path '%s' with kwargs=%s"
    810                   % (f_builder.name, self.source, kwargs))
    811 for name, gbldr in f_builder.groups.items():
--> 812     self.write_group(self.__file, gbldr, **kwargs)
    813 for name, dbldr in f_builder.datasets.items():
    814     self.write_dataset(self.__file, dbldr, **kwargs)

File ~/miniforge3/envs/spyglass/lib/python3.9/site-packages/hdmf/utils.py:664, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    662 def func_call(*args, **kwargs):
    663     pargs = _check_args(args, kwargs)
--> 664     return func(args[0], **pargs)

File ~/miniforge3/envs/spyglass/lib/python3.9/site-packages/hdmf/backends/hdf5/h5tools.py:999, in HDF5IO.write_group(self, **kwargs)
    997 if datasets:
    998     for dset_name, sub_builder in datasets.items():
--> 999         self.write_dataset(group, sub_builder, **kwargs)
   1000 # write all links
   1001 links = builder.links

File ~/miniforge3/envs/spyglass/lib/python3.9/site-packages/hdmf/utils.py:664, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    662 def func_call(*args, **kwargs):
    663     pargs = _check_args(args, kwargs)
--> 664     return func(args[0], **pargs)

File ~/miniforge3/envs/spyglass/lib/python3.9/site-packages/hdmf/backends/hdf5/h5tools.py:1306, in HDF5IO.write_dataset(self, **kwargs)
   1304 # Write a regular in memory array (e.g., numpy array, list etc.)
   1305 elif hasattr(data, '__len__'):
-> 1306     dset = self.__list_fill__(parent, name, data, options)
   1307 # Write a regular scalar dataset
   1308 else:
   1309     dset = self.__scalar_fill__(parent, name, data, options)

File ~/miniforge3/envs/spyglass/lib/python3.9/site-packages/hdmf/backends/hdf5/h5tools.py:1472, in HDF5IO.__list_fill__(cls, parent, name, data, options)
   1470     dset[:] = data
   1471 except Exception as e:
-> 1472     raise e
   1473 return dset

File ~/miniforge3/envs/spyglass/lib/python3.9/site-packages/hdmf/backends/hdf5/h5tools.py:1470, in HDF5IO.__list_fill__(cls, parent, name, data, options)
   1468     dset.resize(new_shape)
   1469 try:
-> 1470     dset[:] = data
   1471 except Exception as e:
   1472     raise e

File h5py/_objects.pyx:54, in h5py._objects.with_phil.wrapper()

File h5py/_objects.pyx:55, in h5py._objects.with_phil.wrapper()

File ~/miniforge3/envs/spyglass/lib/python3.9/site-packages/h5py/_hl/dataset.py:920, in Dataset.__setitem__(self, args, val)
    916 else:
    917     # If the input data is already an array, let HDF5 do the conversion.
    918     # If it's a list or similar, don't make numpy guess a dtype for it.
    919     dt = None if isinstance(val, numpy.ndarray) else self.dtype.base
--> 920     val = numpy.asarray(val, order='C', dtype=dt)
    922 # Check for array dtype compatibility and convert
    923 if self.dtype.subdtype is not None:

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 2 dimensions. The detected shape was (86, 28) + inhomogeneous part.

It seems like something particular to probe data isn't accounted for when running this, since I don't run into this issue otherwise.

edeno commented 7 months ago

Could you specify what key you're trying to run?

rpswenson commented 7 months ago

Here's the key:

{'sorting_id': UUID('7b0e33eb-9355-426a-8141-da6cf807134e'), 'curation_id': 0, 'waveform_param_name': 'default_whitened_20000spikes_20jobs', 'metric_param_name': 'peak_offset_num_spikes_20000spikes_v2', 'metric_curation_param_name': 'noise0.03_isi0.0025_offset2', 'metric_curation_id': UUID('b6fb4af4-97d4-4f06-8328-4f5e52bcb316')}

edeno commented 7 months ago

Can you try running it with:

{
    'sorting_id': '7b0e33eb-9355-426a-8141-da6cf807134e',
    'curation_id': 0,
    'waveform_param_name': 'default_whitened_20000spikes_20jobs',
    'metric_param_name': 'peak_offset_num_spikes_20000spikes_v2',
    'metric_curation_param_name': 'noise0.03_isi0.0025_offset2',
    'metric_curation_id': 'b6fb4af4-97d4-4f06-8328-4f5e52bcb316'
}
rpswenson commented 7 months ago

How would I do that? I think the sorting_id and metric_curation_id are generated during the insert selection step. When defining the key in my code I have it like this:

key = {
            "sorting_id": (
                sgs.SpikeSortingSelection & {"recording_id": key["recording_id"]} & {"sorter_param_name": key["sorter_param_name"]}
            ).fetch1("sorting_id"),
            "curation_id": 0,
            "waveform_param_name": "default_whitened_20000spikes_20jobs",
            "metric_param_name": "peak_offset_num_spikes_20000spikes_v2",
            "metric_curation_param_name": "noise0.03_isi0.0025_offset2",
        }
rpswenson commented 7 months ago

Ok @edeno I think I figured out what you were asking for. I ran:

key = {
    'sorting_id': '7b0e33eb-9355-426a-8141-da6cf807134e',
    'curation_id': 0,
    'waveform_param_name': 'default_whitened_20000spikes_20jobs',
    'metric_param_name': 'peak_offset_num_spikes_20000spikes_v2',
    'metric_curation_param_name': 'noise0.03_isi0.0025_offset2',
    'metric_curation_id': 'b6fb4af4-97d4-4f06-8328-4f5e52bcb316'
}
sgs.MetricCuration.populate(key)

on its own in a separate cell to make sure it wasn't something about the loop messing things up, but it still caused the same error

edeno commented 7 months ago

@khl02007 do you have any idea of what's gong on here?

khl02007 commented 7 months ago

This is happening when the results of computing quality metrics is saved to NWB. This includes spike train, waveforms, electrodes, labels, merge groups, and metrics. @rly what would be the easiest way to figure out which of these are problematic? Can you tell based on the error message? The relevant function is here https://github.com/LorenFrankLab/spyglass/blob/201c67010ec941d58fa4d325955c3a6b3cac2c12/src/spyglass/spikesorting/v1/metric_curation.py#L497

rpswenson commented 6 months ago

Hey all, just wondering if there's any update on this

edeno commented 6 months ago

I think it would be easiest to diagnose by running %debug in the jupyter notebook in the cell after the error. @MichaelCoulter could you help @rpswenson with that? We just need to know what metric is causing the error.

rpswenson commented 6 months ago

Hey, we were able to figure out the issue. Some change to spikeinterface meant that in the waveform parameters, 'sparse' had to be set to False. We made a new WaveformParameters entry ('default_whitened_20000spikes_20jobs_v3') that has this change, and the code runs fine now.