Open TRuikes opened 4 years ago
Hi @TRuikes! Thanks for the very extensive investigation and list of references. Regarding the individual points:
input_inverted
flag is set. However, if you think this is unintuitive, feel free to change it, since I implemented this rather on a hunch and not with a specific use case in mind.DspDelayCompensation
completely. Same here, if you have a file exhibiting this feature we can investigate how to deal with this.Hi @TRuikes. Thank you for this long analysis. Feel free to make proposal with PRs.
Julia have a better knownledge than me about format specification.
About the inverted input. Do you want to say, it is a bug or it is a feature ? Ot would like a karg to control that `compensate_interved=True/false
About regexp for datetime we can have dataetime=None when it fail, I agree.
About DspDelayCompensation and DspFilterDelay, I don't known I let Julia anwser.
About gap, I am OK to have a karg that cocontrol the control the behavior when gap.
Like gap_mode='multi_seg' / 'fill_nan'
. It is a lot of works. keep in mind that this is a neorawio and so it have to lazy, not loadinf when parse header. Every load is on the fly. This is why we need to keep this memmap system intact.
Anwser at same time. I think we can have both behavior possible. But it is lead to more difficult code.
thanks for the responses. Ill upload examples when I can.
input_inverted
I would see it as a feature, it's a nice functionality for sure. But what happend now is that the 'researcher' told me he recorded his data with 'inverted' on, then I use this IO to load data and generate a .dat
file for klusta (using spikesorting in fact), and in the parameters I ask klusta to look for negative peaks. For me a line in the docstring would be enough, so I would be aware of this. If this is default behaviour of all IO's in neo you can probably mention it somewhere in the documentation as well.The Dsp thing will become more clear with the data, shortly summarized, if there is no compensation, but there is filtering, Cheetah will filter raw data, filter it and save the datapoints at the timestamp of recording, instead of compensating for the lag by the filter (and save the data at timestamp + filter lag).
For me the main problem with having it solved with segments, is that you will end up with 6 segments for one .ncs file and 4 for another. This will break the API we build ourselves around neo because we assume 1 segment (having it deal with a variable amount of segments of different lengths doesn't seem great), and I think this is not the behaviour you would like either?. I wouldn't fill them with NaN's either, I'd prefer zero's or the signal mean, I thought I'd start off with pointing out this thing where we might need to fill in the data.
I haven't thought much further than this yet, but current behaviour can be extended with:
patchdata=False
, patchpath=dirname
<- path where to store patched .ncs filesIO.read_ncs
if patchdata=True
and the data is broken (and has not been patched before) call IO.patch_data
, which will write the data into patchpath
. This way you'd have to run the patch functions only once per dataset, after this the data should be compatible with the current IO.
I'm not sure if the broken data is a recurring thing, and worth the effort to be properly dealt with in neo. I think it would be usefull, and perhaps a selling point (right now I could also patch the data with FieldTrip, Matlab and then start using neo).
Hi, @TRuikes I've been working with Neuralynx for about 5 years now in both, Matlab and Python. I'm not a maintainer, however, I've studied neo interface thoroughly and I can answer make some clarifications.
- Input is automatically inverted, I was not aware this was happening, but is important to know.
A:Input inverted
is a reasonable feature when working with positive/negative fields, like reversions. So, having it set by default is being agnostic to the experimenter's setup.
- NCS files within one session can have different data lengths, causing this to break. When
DspFilterDelay_µs > 0
andDspDelayCompensation == Enabled
, there seems to be padding of the data by Cheetah (im not 100% confident about how). For a tetrode, someone would have 3 channels saved withDspFilterDelay==0
, but one is being bandpassed (DspFilterDelay >0
), yielding patches before and after the signal if the Dsp Delay compensation is on.- If
DspDelayCompensation == Disabled
andDspFilterDelay > 0
, this is not dealt with, and one would have channels with improper alignment
A: Data nonuniform alignment is a 'specific' feature for Cheetah, as not much of other ephy systems allow doing that. We, in our lab, are advising not to record with configs allowing that or non-uniform sampling rates (some feature we've heard of, but not using it too) simultaneously. Cases of non-uniform alignment are subject to manual processing, as it depends on how exactly you want to fill the gaps. The default approach for FT filling with NANs could be sooooooo RAMy, e.g. you have only 15 secs recorded of a minute with other 45 secs for a pause for 100 trials or so in one record. Thus FT would increase RAM needed for your record 4 times. Imagine then if you have 100 channels at 32kHz... My PC just blew. The gap-to-segment strategy is pretty straightforward and consistent with other recording systems that allow pausing.
- Ring buffer gaps are detected as gaps, and translated as different segments. In short, if the buffer during recording overflows, Cheetah will drop the data, making the recorded values discontinuous. Still Cheetah will save the timestamps for the signal, so it is solvable. In fact you are allready reading those gaps with this line, but treat them as segments, instead of faulty data. Fieldtrip has a nice implementation to deal with this; ft_read_neuralynx_interp, wherein they fill the gaps with NaN's. This ensures for instance that when generating .dat files the data is properly aligned.
A: I've partially made this happen. When I bumped into this problem, with lost samples, I've found reconstructing the signal from raw timestamps to be quite ambiguous. The thing is that you're always reading from handy memmaps, and then reshaping for fit blocks into a continuous trace. I've found that having it done would be time-consuming/memory consuming using interpolations or masking traces, and considering the buffering errors are quite few, it is easier to process them as individual segments and then concatenate them on the user side, as I did eventually.
input_inverted
I would see it as a feature, it's a nice functionality for sure. But what happend now is that the 'researcher' told me he recorded his data with 'inverted' on, then I use this IO to load data and generate a.dat
file for klusta (using spikesorting in fact), and in the parameters I ask klusta to look for negative peaks. For me a line in the docstring would be enough, so I would be aware of this. If this is default behaviour of all IO's in neo you can probably mention it somewhere in the documentation as well.
A: Klusta is working pretty well with this parameter to be ON by default. That's why it is there. You should bother not about how an experimenter did his config, inverted or not. In addition, spyking circus working with ncs files uses this parameter the same way and it works well: https://github.com/spyking-circus/spyking-circus/blob/8efc51b357ec04cb924bcdb5601eeda8a6bfcb31/circus/files/neuralynx.py#L250
Hi @sin-mike, good that you join the discussion, I'm fairly inexperienced I'd say, so it's good to have some more input from experimentalists. I'll comment below on your remarks, after this I'll continue on the PR: #820. I also uploaded examples of my breaking data. And reading back I realized i didn't mention that the issues above cause to IO to break for me. Let me recap the 3 remaining issues:
Then all of these issues can be solved simultaneously by implementing a flow along the lines:
@sin-mike if you do this:
you have only 15 secs recorded of a minute with other 45 secs for a pause for 100 trials or so in one record
it is recorded in the events file right? then you can define your segments by recording starts and stops. @JuliaSprenger is there a specific reason why you check for gaps in the timestamps to define segments? Why not use the recording start and recording stop signals to define segments?
Hi! While working with NeuralynxIO in different datasets I came across a few bug/missing features, which I think will be useful (necessary) additions:
## Time Closed File was not closed properly
; we don't know how it got there, but the data itself seems intactDspFilterDelay_µs > 0
andDspDelayCompensation == Enabled
, there seems to be padding of the data by Cheetah (im not 100% confident about how). For a tetrode, someone would have 3 channels saved withDspFilterDelay==0
, but one is being bandpassed (DspFilterDelay >0
), yielding patches before and after the signal if the Dsp Delay compensation is on.DspDelayCompensation == Disabled
andDspFilterDelay > 0
, this is not dealt with, and one would have channels with improper alignmentThe header thing will be solved easily, but the delay and buffer errors are a bit more though, I you would have to restructure how
.ncs
files are read. Currently the first file in read, used to determine gaps/data size shape, and it is ignorant to any Dsp delays.Maybe this workflow would solve it:
I can this a try for a pull request if you want, the only thing I'm not sure about how to solve is how to patch the data, as there's
np.memmap
being used.