catalystneuro / neuroconv

Create NWB files by converting and combining neural data in proprietary formats and adding essential metadata.
https://neuroconv.readthedocs.io
BSD 3-Clause "New" or "Revised" License
51 stars 23 forks source link

[Bug]: "channel_name" error when exporting from Spikeinterface back to the original NWB file #1135

Open borrepp opened 2 weeks ago

borrepp commented 2 weeks ago

What happened?

Hello everyone.

I have an NWB with ephys data (behavior & raw 30K data). I'm using spikeinterface to read from NWB for sorting the single neurons and to create the LFPs. Tha parts is done and works fine.

However, when I tried to go back to the NWB, I saw that "NEUROCONV" makes use of the spikeinterface.recording property "channel_name" to identify which electrodes are being imported. This "channel_name" in the spikeinterface.recording corresponds/matches with the "nwb.electrodes.id". But this column name does not exist in the original NWB. Therefore, when "adding recording to nwbfile" calls "add_electrdes_to_nwbfile" is trying to append all the channels as "new electrodes" because the function "_get_electrodes_table_global_ids" is returning an empty list.

I think that might be better to FIRST check whether the nwbfile has the "channel_name" the electrodeTable (~line 580 in spikeinterface.py) before checking which electrodes to append (~line 568 in spikeinterface.py).

Hope it makes sense this comment.

Also, I'm including the error that I got when I run "add_electrodes_to_nwbfile" in the traceback section. However, I think this error is indirectly related to the comment I made above.

Steps to Reproduce

from pynwb import NWBHDF5IO
import neuroconv.tools.spikeinterface.spikeinterface as siconv

fPrepo_io = NWBHDF5IO(nwbPrepro_path, mode="r+")
nwbPrepo = fPrepo_io.read()

electrode_groups_names = [k for k in nwbPrepro.acquisition.keys() if 'raw-' in k]
print(electrode_groups_names)

print('Extract into SI the first electrode group: ', electrode_groups_names[0])

si_recording_raw = read_nwb(
        file_path=nwbPrepro_path, 
        electrical_series_path ='acquisition/' + electrode_groups_names[0],
        load_recording=True,
        load_sorting=False
)

si_recording_lfp = bandpass_filter(si_recording_raw, **bandpass_params)
si_recording_lfp1k = resample(si_recording_lfp, **resample_params)

siconv.add_electrodes_to_nwbfile(
    recording = si_recording_lfp1k,
    nwbfile = nwbPrepo)

Traceback

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[9], line 1
----> 1 siconv.add_electrodes_to_nwbfile(
      2     recording = si_recording_lfp1k,
      3     nwbfile = nwbPrepo)

File m:\Monkey_Python\SIenv\Lib\site-packages\neuroconv\tools\spikeinterface\spikeinterface.py:573, in add_electrodes_to_nwbfile(recording, nwbfile, metadata, exclude, null_values_for_properties)
    571     data_dict = {property: data_to_add[property]["data"][channel_index] for property in properties_with_data}
    572     electrode_kwargs.update(**data_dict)
--> 573     nwbfile.add_electrode(**electrode_kwargs, enforce_unique_id=True)
    575 # The channel_name column as we use channel_name, group_name as a unique identifier
    576 # We fill previously inexistent values with the electrode table ids
    577 electrode_table_size = len(nwbfile.electrodes.id[:])

File m:\Monkey_Python\SIenv\Lib\site-packages\hdmf\utils.py:668, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    666 def func_call(*args, **kwargs):
    667     pargs = _check_args(args, kwargs)
--> 668     return func(args[0], **pargs)

File m:\Monkey_Python\SIenv\Lib\site-packages\pynwb\file.py:728, in NWBFile.add_electrode(self, **kwargs)
    725     else:
    726         d.pop(col_name)  # remove args from d if not set
--> 728 self.electrodes.add_row(**d)

File m:\Monkey_Python\SIenv\Lib\site-packages\hdmf\utils.py:668, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    666 def func_call(*args, **kwargs):
    667     pargs = _check_args(args, kwargs)
--> 668     return func(args[0], **pargs)

File m:\Monkey_Python\SIenv\Lib\site-packages\hdmf\common\table.py:706, in DynamicTable.add_row(self, **kwargs)
    704     if row_id in self.id:
    705         raise ValueError("id %i already in the table" % row_id)
--> 706 self.id.append(row_id)
    708 for colname, colnum in self.__colids.items():
    709     if colname not in data:

File m:\Monkey_Python\SIenv\Lib\site-packages\hdmf\container.py:986, in Data.append(self, arg)
    984 def append(self, arg):
    985     self._validate_new_data_element(arg)
--> 986     self.__data = append_data(self.__data, arg)

File m:\Monkey_Python\SIenv\Lib\site-packages\hdmf\data_utils.py:36, in append_data(data, arg)
     34 shape = list(data.shape)
     35 shape[0] += 1
---> 36 data.resize(shape)
     37 data[-1] = arg
     38 return data

File m:\Monkey_Python\SIenv\Lib\site-packages\h5py\_hl\dataset.py:666, in Dataset.resize(self, size, axis)
    664 with phil:
    665     if self.chunks is None:
--> 666         raise TypeError("Only chunked datasets can be resized")
    668     if axis is not None:
    669         if not (axis >=0 and axis < self.id.rank):

TypeError: Only chunked datasets can be resized

Operating System

Windows

Python Executable

Python

Python Version

3.11

Package Versions

No response

Code of Conduct

h-mayorquin commented 2 weeks ago

Hi, thanks for opening an issue.

Can you give more details on what you are trying to achieve:

borrepp commented 2 weeks ago

Hello Heberto @h-mayorquin

Thanks for your response.

I was testing to see if it would add more electrodes or recognize that they already exist. I wanted to track the process of adding the recording step by step to ensure that it was saved properly.

We have a custom code to create the original NWB file. I have run the nwb-inspector and it passes as a good nwb file.

When I added the recording directly, I also got the same error

Thanks in advance for your help

siconv.add_recording_to_nwbfile(
    recording = si_recording_lfp1k,
    nwbfile = nwbPrepo,
    write_as='lfp'
    )

Error:

   {
    "name": "TypeError",
    "message": "Only chunked datasets can be resized",
    "stack": "---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[10], line 1
----> 1 siconv.add_recording_to_nwbfile(
      2     recording = si_recording_lfp1k,
      3     nwbfile = nwbPrepo,
      4     write_as='lfp'
      5     )

File m:\\Monkey_Python\\SIenv\\Lib\\site-packages\
euroconv\\tools\\spikeinterface\\spikeinterface.py:1127, in add_recording_to_nwbfile(recording, nwbfile, metadata, starting_time, write_as, es_key, write_electrical_series, write_scaled, compression, compression_opts, iterator_type, iterator_opts, always_write_timestamps)
   1124 elif metadata is None:
   1125     metadata = _get_nwb_metadata(recording=recording)
-> 1127 add_electrodes_info_to_nwbfile(recording=recording, nwbfile=nwbfile, metadata=metadata)
   1129 if write_electrical_series:
   1130     number_of_segments = recording.get_num_segments()

File m:\\Monkey_Python\\SIenv\\Lib\\site-packages\
euroconv\\tools\\spikeinterface\\spikeinterface.py:1008, in add_electrodes_info_to_nwbfile(recording, nwbfile, metadata)
   1006 add_devices_to_nwbfile(nwbfile=nwbfile, metadata=metadata)
   1007 add_electrode_groups_to_nwbfile(recording=recording, nwbfile=nwbfile, metadata=metadata)
-> 1008 add_electrodes_to_nwbfile(recording=recording, nwbfile=nwbfile, metadata=metadata)

File m:\\Monkey_Python\\SIenv\\Lib\\site-packages\
euroconv\\tools\\spikeinterface\\spikeinterface.py:573, in add_electrodes_to_nwbfile(recording, nwbfile, metadata, exclude, null_values_for_properties)
    571     data_dict = {property: data_to_add[property][\"data\"][channel_index] for property in properties_with_data}
    572     electrode_kwargs.update(**data_dict)
--> 573     nwbfile.add_electrode(**electrode_kwargs, enforce_unique_id=True)
    575 # The channel_name column as we use channel_name, group_name as a unique identifier
    576 # We fill previously inexistent values with the electrode table ids
    577 electrode_table_size = len(nwbfile.electrodes.id[:])

File m:\\Monkey_Python\\SIenv\\Lib\\site-packages\\hdmf\\utils.py:668, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    666 def func_call(*args, **kwargs):
    667     pargs = _check_args(args, kwargs)
--> 668     return func(args[0], **pargs)

File m:\\Monkey_Python\\SIenv\\Lib\\site-packages\\pynwb\\file.py:728, in NWBFile.add_electrode(self, **kwargs)
    725     else:
    726         d.pop(col_name)  # remove args from d if not set
--> 728 self.electrodes.add_row(**d)

File m:\\Monkey_Python\\SIenv\\Lib\\site-packages\\hdmf\\utils.py:668, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    666 def func_call(*args, **kwargs):
    667     pargs = _check_args(args, kwargs)
--> 668     return func(args[0], **pargs)

File m:\\Monkey_Python\\SIenv\\Lib\\site-packages\\hdmf\\common\\table.py:706, in DynamicTable.add_row(self, **kwargs)
    704     if row_id in self.id:
    705         raise ValueError(\"id %i already in the table\" % row_id)
--> 706 self.id.append(row_id)
    708 for colname, colnum in self.__colids.items():
    709     if colname not in data:

File m:\\Monkey_Python\\SIenv\\Lib\\site-packages\\hdmf\\container.py:986, in Data.append(self, arg)
    984 def append(self, arg):
    985     self._validate_new_data_element(arg)
--> 986     self.__data = append_data(self.__data, arg)

File m:\\Monkey_Python\\SIenv\\Lib\\site-packages\\hdmf\\data_utils.py:36, in append_data(data, arg)
     34 shape = list(data.shape)
     35 shape[0] += 1
---> 36 data.resize(shape)
     37 data[-1] = arg
     38 return data

File m:\\Monkey_Python\\SIenv\\Lib\\site-packages\\h5py\\_hl\\dataset.py:666, in Dataset.resize(self, size, axis)
    664 with phil:
    665     if self.chunks is None:
--> 666         raise TypeError(\"Only chunked datasets can be resized\")
    668     if axis is not None:
    669         if not (axis >=0 and axis < self.id.rank):

TypeError: Only chunked datasets can be resized"
}
borrepp commented 2 weeks ago

I just want to confirm what I want to achieve:

What I want to achieve is to re-import the "lfp" data into the original NWB.

I will need to do the same ("reimporting") for the sorting results (i.e., sorting_analyzer).

Also, the way I have been able to sort this problem is by adding the "channel_name" column to the original nwbFile on my own. That allows me to reimport the spikeinterface.recording into the original nwbFile without a problem.

h-mayorquin commented 2 weeks ago

Thanks for the explanation.

I took a deeper look at this and found out that this is a problem with the non-expandable nature electrode table and not a problem with the function per see. We actually have a test for this for in-memory nwbfiles here:

https://github.com/catalystneuro/neuroconv/blob/6960872de0b11f9f04d4f0efecbec4aa9c010c12/tests/test_ecephys/test_tools_spikeinterface.py#L739-L798

The problem is when you write the file then the electrode table is no longer expandable.

I will discuss this with the team and will comeback to you with a plan for changing this or a workaround.

h-mayorquin commented 1 week ago

Hi, we discussed this with the team today. As I mentioned, the problem is that the electrodes table is not expandable in your first set. We are working on a solution here:

https://github.com/catalystneuro/neuroconv/pull/1137

Once this is done I can show you how to add some lines to the function where you wrote the nwbfile so your electrode table is expandable.

Making objects expandable is something have been discussed for a long time. See some references here: https://github.com/hdmf-dev/hdmf/issues/1153 https://github.com/hdmf-dev/hdmf/pull/1017 https://github.com/NeurodataWithoutBorders/pynwb/issues/1067 https://github.com/NeurodataWithoutBorders/pynwb/issues/918 https://github.com/hdmf-dev/hdmf/pull/1093

borrepp commented 1 week ago

Hello Heberto @h-mayorquin

Thank you so much for all the information.

I understood that the error was that the electrode table was not expandable. However, this error shouldn't occur because I'm using the "add_recording_to_nwbfile" with a "spikeinterface.recording" input extracted from the same nwbFile. Does it make sense what I mean?

I have an NWBfile that has a raw signal. I import the raw signal with spikeInterface, I do filtering (or some preprocessing like motion correction, etc). Then, I want to save the recording back into the NWBfile. The Neuroconv function should not append new electrodes. It should have recognized the channel names & group names.

My first comment pointed out that when adding electrodes, a step is always performed when adding a recording or sorting analyzer, the function should add the column "channel_name" to the NWBfile first and later check the recording to find which electrodes to append.

The reason is that the column: "channel_name" is not a default/requisite column in the electrode's NWB scheme. Therefore, when it searches for that column, the function "_get_electrodes_table_global_ids" returns empty channels.

In fact, by adding the column "channel_name" to the NWBfile's electrode table, there is not error when I add the recording object because it recognizes that the channels are already in the electrode table.

I hope what I'm trying to explain makes sense.

Thanks. Best Pepe

h-mayorquin commented 1 week ago

Ok, I think I understand what you are proposing. You are saying that we should add the channel name before determining the global indices to decide whether we append electrodes or not. I think this makes sense in your case but I am not sure can be done safely in general:

Consider the case when the recordings do not match. We start appending the channel column but then the group from the recording is in fact different from the one in the table. In that case, you will be appending both the column of the ids as channel nam and then appending all the channels as rows which is wrong.

There could be a solution to where we check the pair of the group in the electrode table together with the channel id but our function is a little bit too complex already and I am not sure we should add yet another corner case here given that you already have a simple solution (just add channel name to your original script for writing the nwbfile).

Can you send me an email to h . mayorquin in gmail (no spaces) so we can have a quick discussion about the motivation. Maybe it makes sense but I want to understand more about your usage.

borrepp commented 1 week ago

Sure, I will email you now.

Thank you so much