NeuralEnsemble / python-neo

Neo is a package for representing electrophysiology data in Python, together with support for reading a wide range of neurophysiology file formats
http://neo.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
323 stars 248 forks source link

Analog signal data not read from TDT SEV #1129

Open grahamfindlay opened 2 years ago

grahamfindlay commented 2 years ago

Hi, all. I have never used neo directly before, so I apologize if this is just user error. But I have read through docs and code, and I think what I'm seeing is unexpected behavior.

I have a TDT tank containing a single TDT block using SEV (rather than TEV) format. I can successfully create a TdtIO object, read the .Tbk and .tsq files, which seem to reflect the right streams, with the right names, and the right numbers of channels. But it seems as if these header files are not being parsed properly, because the AnalogSignal objects that I get have incorrect sampling rates (1.0 Hz, vs the actual ~610Hz), incorrect durations (0.0s vs several hours), and no actual data (e.g. anasig.shape is (0, 4) for a 4 channel signal). I notice that the TdtIO._sigs_lengths dict is {0: {0: 0, 1: 0}}, which it seems should reflect the size of the data. But there doesn't at first glance appear to be anything wrong with the '.Tbk' and '.tsq' parsing, so perhaps the issue lies somewhere in between.

Here's a demonstration: read_tdt_using_neo.pdf. The data are ~800MB. Let me know if you'd like a copy in order to reproduce the issue.

Thanks, Graham

samuelgarcia commented 2 years ago

Hhi Graham, I did this reader very long time ago. I don't remember any detail. @JuliaSprenger : have refactor a bit recently. Maybe she will be able to help.

JuliaSprenger commented 2 years ago

Hi @grahamfindlay, thanks for reporting this. My first suspicion would be that the .sev files are not detected by the io, as they are containing an additional _EEGf identifier in the filename, that is not present for the other files. Could you try renaming some of your sev files to just FCSV_EEG-220215-173221_Paul-220607-102425_Ch{1..4}.sev?

grahamfindlay commented 2 years ago

Thanks, both. 4 of the .sev files have _EEGf identifiers, and 4 have _EEGr identifiers. If I were to delete these identifiers from both sets of files, then the two sets would have identical filenames (i.e. the filenames are identical apart from these identifiers), so I can't do that. What I did instead, was:

  1. Delete the _EEGf identifiers, and leave the _EEGr identifiers in place. This was unsuccessful (Sorry for the ugly pandoc/nbconvert formatting >.<) attempt1.pdf

  2. Delete the _EEGf identifiers, and delete the _EEGr files entirely. This was also unsuccessful. attempt2.pdf

Perhaps it would be possible to use TDT's Python package to handle ingest of their data & headers/metadata, and massage this into the standardized neo format?

JuliaSprenger commented 2 years ago

I will extend the TDTIO to include also the _EEGf and _EEGr type postfixes in the internal channel identifiers.

grahamfindlay commented 2 years ago

The "EEGr" and "EEGf" identifiers correspond to stream names, and are set by the user. In my case they are abbreviations for "EEG raw" and "EEG filtered", but likely ever user has their own stream identifiers based on how they've configured their recording project. So it will be necessary to accept arbitrary identifiers.

I have tried to reverse engineer TDT file formats before, and they can be quite complicated. There are really several different formats depending on recording software / hardware version, user choices, etc. That's why using their python package may save a lot of time, if possible.

grahamfindlay commented 2 years ago

@JuliaSprenger One of my colleagues is also not able to load his TDT data using v0.10.2 because of another issue: Neo assumes that the path to the .Tbk file will be of the form <tank>/<block>/<tank>_<block>.Tbk, when apparently it is common practice to rename the tank and block directories after recording, because TDT adds timestamps to these names that make them very long. Apparently the TDT python package has no issue with this directory structure, and only cares that there is a .Tbk file at the block path that you provide it with.

Example: /Volumes/opto_loc/Data/ACR_exp/ACR_9/ACR_exp-220614-083228_ACR_9-220617-083308.Tbk' At the time of recording, the user requested the tank name "ACR_exp" and the block name was "ACR_9". Then TDT added the time and datestamps when saving to disk, and the user subsequently stripped them from from the directory names.

At first I thought that it doesn't make sense to expect support for post-recording modification of the tank and block names, but given that these are the names the user actually requests from the software, and the timestamped versions do not appear anywhere else in the file metadata, and TDT's own python package has no issue with this, and it seems to be common behavior, perhaps it would make sense to follow their lead and not check .Tbk filename against the parent and grandparent directory names.

grahamfindlay commented 2 years ago

Update: Found another person with this same issue involving .Tbk naming.