NeuralEnsemble / python-neo

Neo is a package for representing electrophysiology data in Python, together with support for reading a wide range of neurophysiology file formats
http://neo.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
322 stars 248 forks source link

[NeuralynxIO] Update .ncs file reading #819

Open TRuikes opened 4 years ago

TRuikes commented 4 years ago

Hi! While working with NeuralynxIO in different datasets I came across a few bug/missing features, which I think will be useful (necessary) additions:

The header thing will be solved easily, but the delay and buffer errors are a bit more though, I you would have to restructure how .ncs files are read. Currently the first file in read, used to determine gaps/data size shape, and it is ignorant to any Dsp delays.

Maybe this workflow would solve it:

I can this a try for a pull request if you want, the only thing I'm not sure about how to solve is how to patch the data, as there's np.memmap being used.

JuliaSprenger commented 4 years ago

Hi @TRuikes! Thanks for the very extensive investigation and list of references. Regarding the individual points:

samuelgarcia commented 4 years ago

Hi @TRuikes. Thank you for this long analysis. Feel free to make proposal with PRs.

Julia have a better knownledge than me about format specification.

About the inverted input. Do you want to say, it is a bug or it is a feature ? Ot would like a karg to control that `compensate_interved=True/false

About regexp for datetime we can have dataetime=None when it fail, I agree.

About DspDelayCompensation and DspFilterDelay, I don't known I let Julia anwser.

About gap, I am OK to have a karg that cocontrol the control the behavior when gap. Like gap_mode='multi_seg' / 'fill_nan'. It is a lot of works. keep in mind that this is a neorawio and so it have to lazy, not loadinf when parse header. Every load is on the fly. This is why we need to keep this memmap system intact.

samuelgarcia commented 4 years ago

Anwser at same time. I think we can have both behavior possible. But it is lead to more difficult code.

TRuikes commented 4 years ago

thanks for the responses. Ill upload examples when I can.

The Dsp thing will become more clear with the data, shortly summarized, if there is no compensation, but there is filtering, Cheetah will filter raw data, filter it and save the datapoints at the timestamp of recording, instead of compensating for the lag by the filter (and save the data at timestamp + filter lag).

For me the main problem with having it solved with segments, is that you will end up with 6 segments for one .ncs file and 4 for another. This will break the API we build ourselves around neo because we assume 1 segment (having it deal with a variable amount of segments of different lengths doesn't seem great), and I think this is not the behaviour you would like either?. I wouldn't fill them with NaN's either, I'd prefer zero's or the signal mean, I thought I'd start off with pointing out this thing where we might need to fill in the data.

I haven't thought much further than this yet, but current behaviour can be extended with:

This way you'd have to run the patch functions only once per dataset, after this the data should be compatible with the current IO.

I'm not sure if the broken data is a recurring thing, and worth the effort to be properly dealt with in neo. I think it would be usefull, and perhaps a selling point (right now I could also patch the data with FieldTrip, Matlab and then start using neo).

sin-mike commented 4 years ago

Hi, @TRuikes I've been working with Neuralynx for about 5 years now in both, Matlab and Python. I'm not a maintainer, however, I've studied neo interface thoroughly and I can answer make some clarifications.

A:Input inverted is a reasonable feature when working with positive/negative fields, like reversions. So, having it set by default is being agnostic to the experimenter's setup.

  • NCS files within one session can have different data lengths, causing this to break. When DspFilterDelay_µs > 0 and DspDelayCompensation == Enabled, there seems to be padding of the data by Cheetah (im not 100% confident about how). For a tetrode, someone would have 3 channels saved with DspFilterDelay==0, but one is being bandpassed (DspFilterDelay >0), yielding patches before and after the signal if the Dsp Delay compensation is on.
  • If DspDelayCompensation == Disabled and DspFilterDelay > 0, this is not dealt with, and one would have channels with improper alignment

A: Data nonuniform alignment is a 'specific' feature for Cheetah, as not much of other ephy systems allow doing that. We, in our lab, are advising not to record with configs allowing that or non-uniform sampling rates (some feature we've heard of, but not using it too) simultaneously. Cases of non-uniform alignment are subject to manual processing, as it depends on how exactly you want to fill the gaps. The default approach for FT filling with NANs could be sooooooo RAMy, e.g. you have only 15 secs recorded of a minute with other 45 secs for a pause for 100 trials or so in one record. Thus FT would increase RAM needed for your record 4 times. Imagine then if you have 100 channels at 32kHz... My PC just blew. The gap-to-segment strategy is pretty straightforward and consistent with other recording systems that allow pausing.

A: I've partially made this happen. When I bumped into this problem, with lost samples, I've found reconstructing the signal from raw timestamps to be quite ambiguous. The thing is that you're always reading from handy memmaps, and then reshaping for fit blocks into a continuous trace. I've found that having it done would be time-consuming/memory consuming using interpolations or masking traces, and considering the buffering errors are quite few, it is easier to process them as individual segments and then concatenate them on the user side, as I did eventually.

sin-mike commented 4 years ago
  • input_inverted I would see it as a feature, it's a nice functionality for sure. But what happend now is that the 'researcher' told me he recorded his data with 'inverted' on, then I use this IO to load data and generate a .dat file for klusta (using spikesorting in fact), and in the parameters I ask klusta to look for negative peaks. For me a line in the docstring would be enough, so I would be aware of this. If this is default behaviour of all IO's in neo you can probably mention it somewhere in the documentation as well.

A: Klusta is working pretty well with this parameter to be ON by default. That's why it is there. You should bother not about how an experimenter did his config, inverted or not. In addition, spyking circus working with ncs files uses this parameter the same way and it works well: https://github.com/spyking-circus/spyking-circus/blob/8efc51b357ec04cb924bcdb5601eeda8a6bfcb31/circus/files/neuralynx.py#L250

TRuikes commented 4 years ago

Hi @sin-mike, good that you join the discussion, I'm fairly inexperienced I'd say, so it's good to have some more input from experimentalists. I'll comment below on your remarks, after this I'll continue on the PR: #820. I also uploaded examples of my breaking data. And reading back I realized i didn't mention that the issues above cause to IO to break for me. Let me recap the 3 remaining issues:

Then all of these issues can be solved simultaneously by implementing a flow along the lines:

TRuikes commented 4 years ago

@sin-mike if you do this:

you have only 15 secs recorded of a minute with other 45 secs for a pause for 100 trials or so in one record

it is recorded in the events file right? then you can define your segments by recording starts and stops. @JuliaSprenger is there a specific reason why you check for gaps in the timestamps to define segments? Why not use the recording start and recording stop signals to define segments?