NeuralEnsemble / python-neo

Neo is a package for representing electrophysiology data in Python, together with support for reading a wide range of neurophysiology file formats
http://neo.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
323 stars 248 forks source link

Electrode selection in Axona raw recording #1454

Open letiziasignorelli opened 7 months ago

letiziasignorelli commented 7 months ago

I'm trying to convert electrophysiological data from Axona to NWB using the code from the Neuroconv https://github.com/catalystneuro/neuroconv.

I'm using neuroconv version 0.4.9 with Python 3.9 on Windows 11 in a conda environment.

I’m selecting the tetrodes to extract from the .set file setting the parameters collectMask* to 1 In the experiment lab I know they’re recording with Axona on channels 17-32, so I'm setting collectMask* to 1 for tetrodes 5-8. But after running the conversion and plotting the extracted signals, I don't get the correct channels (I can see it's only noise in the output). This is the code I'm using for conversion:

from dateutil import tz
from pathlib import Path
from neuroconv.datainterfaces import AxonaRecordingInterface

interface = AxonaRecordingInterface(file_path=".../2509202301.bin", verbose=True, es_key="Ephys")

# Extract what metadata we can from the source files
metadata = interface.get_metadata()
# For data provenance we add the time zone information to the conversion
tzinfo = tz.gettz()
session_start_time = metadata["NWBFile"]["session_start_time"]
metadata["NWBFile"].update(session_start_time=session_start_time.replace(tzinfo=tzinfo))

# Choose a path for saving the nwb file and run the conversion
path_to_save_nwbfile = ".../2509202301.nwb"
nwbfile_path = f"{path_to_save_nwbfile}"
interface.run_conversion(nwbfile_path=nwbfile_path, metadata=metadata)

But after running the conversion and plotting the extracted signals, I don't get the correct channels (I can see it's only noise in the output). This is the plot code:

from pynwb import NWBHDF5IO
import numpy as np
import matplotlib.pyplot as plt

fileobject = NWBHDF5IO('2509202301.nwb', 'r')
nwbobject = fileobject.read()

xx = nwbobject.acquisition['Ephys'].data
time = np.linspace(0,1000, 100000)

fig, ax = plt.subplots(16,1, sharex='all', sharey='all')
for i in range(16):
    ax[i].plot(time, xx[100000:200000,i])
plt.show()

And this is the plot: image

If instead, I try to extract the first 32 channels (so setting collectMask_* to 1 for tetrodes 1-8) I'm getting the correct signals in channels 17-32 (here I'm plotting only the last 16 channels). image

It seems that in any case I'm extracting the first 16 channels if I select 4 tetrodes, or 32 if I'm selecting 8 tetrodes.

See: Originally posted by @CodyCBakerPhD in https://github.com/catalystneuro/neuroconv/issues/787#issuecomment-2032560528

In general, it appears as if upstream control over Axona groups/streams is not exposed as my minimal attempt to get the RecordingExtractor results in only 16 channels (not 32) and only a single group (well, None implying only a single group anyway)

from spikeinterface.extractors import AxonaRecordingExtractor

recording = AxonaRecordingExtractor(file_path=".../2509202301.bin")

recording.channel_ids
> array(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15'], dtype='<U64')

recording.get_channel_groups()
> None

I can share the sample data via email if needed. Thanks!

zm711 commented 7 months ago

Yep We will probably need sample data. I always like asking if people can do a "fake" or sample data set that they just share with a dropbox or google link first (really we only need 1-2 seconds) to troubleshoot a file format. That way we can take that dataset to include in our testing so we make sure any changes in the future don't break our fixes :). If you don't have access to the recording equipment then we can do by email with a private full dataset, but a public sample data set is always better.

letiziasignorelli commented 7 months ago

Ok thanks, we'll create a sample dataset in the next days :)

zm711 commented 6 months ago

@letiziasignorelli,

any small sample recording for us to work on? :)

letiziasignorelli commented 6 months ago

Hi, sorry for the late update. Here is a link with a .set + .bin test file from the lab: https://drive.google.com/drive/folders/1h0clLnAmt0-R2TO4yP1rAJwXZW5WgrKC?usp=sharing

Thanks :)

zm711 commented 6 months ago

Great thanks! I'll try to get to this soon!

zm711 commented 6 months ago

@letiziasignorelli,

I can troubleshoot with those, but I just downloaded and what we would really prefer is something < 10MB (the file you sent is 100MB). Any chance you could do one even smaller to add to our test repo for after I make these fixes?

zm711 commented 6 months ago

So now that I've looked at the data I've got a few questions.

I’m selecting the tetrodes to extract from the .set file setting the parameters collectMask* to 1 In the experiment lab I know they’re recording with Axona on channels 17-32, so I'm setting collectMask* to 1 for tetrodes 5-8. But after running the conversion and plotting the extracted signals, I don't get the correct channels (I can see it's only noise in the output).

What does this mean? How do you change collectMask_*? Is this in the axona gui? Neo relies on checking those values to decide which tetrodes to analyze. So if you put the wrong value into collectMask_* it will mess up Neo (and so SI and neuroconv) completely).

For example the sample data you gave me had

tetrode 2, tetrode 3 and tetrode 4 active. (so channels 5-8, 9-12, 13-16) active, but neo re-scales the values to be 1a, 1b, 1c, 1d, 2a, 2b, 2c,2d, 3a, 3b, 3c, 3d and spikeinterface changes to 0,1,2,3,4,5,6,7,8,9,10,11. You could make an argument that we should keep the original name, but this seems to be working fine for me. If you're lab only recorded from 16-32 then the code should automatically detect that, but if you change collectMask_* it might look at the wrong channel.

So basically could you explain exactly what you're doing and how the *.set file works so I can best understand how we can tweak the io to work.

letiziasignorelli commented 6 months ago

What does this mean? How do you change collectMask*? Is this in the axona gui? Neo relies on checking those values to decide which tetrodes to analyze. So if you put the wrong value into collectMask* it will mess up Neo (and so SI and neuroconv) completely).

The files that I shared with you are the ones that I receive directly from the lab. Since I noticed that the channels extracted after conversion weren't the correct ones, I tried to change the collectMask_* values manually directly from the .set file (In practice I tried to set collectMask_* for tetrodes 5, 6, 7, and 8 to 1, and the rest to 0). And that was the problem I saw:

But after running the conversion and plotting the extracted signals, I don't get the correct channels (I can see it's only noise in the output).

If instead, I try to extract the first 32 channels (so setting collectMask_* to 1 for tetrodes 1-8) I'm getting the correct signals in channels 17-32

I'll try to see myself if there's something wrong when data are acquired with the axona gui in the next days and I'll also try to see if we can record a smaller sample data

letiziasignorelli commented 3 months ago

Hi, I was finally able to get back to this issue and I was able to find a solution that seems to work for my data. So the problem for me seemed to be in neo/rawio/axonarawio.py:

  1. function _get_signal_chan_header returned always tetrode 1a,1b,1c, etc starting from 1 even if my first active tetrode was number 5. So I modified the function to have the correct tetrodes selected:

    def _get_signal_chan_header(self):
    
      active_tetrode_set = self.get_active_tetrode()
      num_active_tetrode = len(active_tetrode_set)
    
      elec_per_tetrode = 4
      letters = ["a", "b", "c", "d"]
      dtype = self.file_parameters["bin"]["data_type"]
      units = "uV"
      gain_list = self._get_channel_gain()
      offset = 0  # What is the offset?
    
      first_channel = (active_tetrode_set[0] - 1)*elec_per_tetrode
      sig_channels = []
      for itetr in range(num_active_tetrode):
    
          for ielec in range(elec_per_tetrode):
              cntr = (itetr * elec_per_tetrode) + ielec + first_channel
              ch_name = "{}{}".format(itetr + active_tetrode_set[0], letters[ielec])
              chan_id = str(cntr)
              gain = gain_list[cntr]
              stream_id = "0"
              # the sampling rate information is stored in the set header
              # and not in the bin file
              sr = self.file_parameters["set"]["sampling_rate"]
              sig_channels.append((ch_name, chan_id, sr, dtype, units, gain, offset, stream_id))
    
      return np.array(sig_channels, dtype=_signal_channel_dtype)
  2. Still, when converting to NWB, or when I was trying to use ephyviewr GUI to visualize my recordings the correct channels were not displayed. It still seemed that it was displaying the first 16 channels or the first 32 (if I had 16 or 32 active channels). So my other problem was in function _get_analogsignal_chunk lines 376-378, it was still taking the first N active channels. So I added new function:

    
    
    def get_active_channels(self):
      """
      Returns the ID numbers of the active channels as a list.
      E.g.: [20,21,22,23] for tetrode 6 active.
      """
      active_tetrodes = self.get_active_tetrode()
      active_channels = []
    
      for tetrode in active_tetrodes:
          chan = self._get_channel_from_tetrode(tetrode)
          active_channels.append(chan)
    
      return np.concatenate(active_channels)

and modified line 376-378 in `_get_analogsignal_chunk` like this:
```python
def _get_analogsignal_chunk(self, block_index, seg_index, i_start, i_stop, stream_index, channel_indexes):

        bin_dict = self.file_parameters["bin"]

        # Set default values
        if i_start is None:
            i_start = 0
        if i_stop is None:
            i_stop = bin_dict["num_total_samples"]
        if channel_indexes is None:
            channel_indexes = [i for i in range(bin_dict["num_channels"])]
        elif isinstance(channel_indexes, slice):
            channel_indexes = self.get_active_channels()

And now for my Axona data it seems to work. I don't know if it's the best way to solve it, but I hope it will be helpful :)

I also added to the same drive folder a shorter test test_short for you to troubleshoot, with 32 channels active from tetrodes 5 to 12. See https://drive.google.com/drive/folders/1h0clLnAmt0-R2TO4yP1rAJwXZW5WgrKC?usp=sharing

zm711 commented 3 months ago

I'll check the test_short shortly, but those actually seem like good fixes to me. Would you like to open a PR with your fixes so we can see if they work on our CI? I think giving the actual channel number is better than our current strategy of just renumbering to 1. Once you have the PR open we can edit it for style etc :)