catalystneuro / neuroconv

Create NWB files by converting and combining neural data in proprietary formats and adding essential metadata.
https://neuroconv.readthedocs.io
BSD 3-Clause "New" or "Revised" License
50 stars 22 forks source link

[Feature] Handle all Intan modes and formats and add documentation #789

Open h-mayorquin opened 5 months ago

h-mayorquin commented 5 months ago

The files from Intan come in a variety of formats:

For the last two we recently added suppport in Neo: https://github.com/NeuralEnsemble/python-neo/pull/1402

I am propagating this to Spikeinterface: https://github.com/SpikeInterface/spikeinterface/pull/2630

And then to Neuroconv.

A further complication is that for the first format (the one that we already support) the files might be divided across time. That is, a file with all the streams and all their channels is produced every x minutes and to get the whole session you have to concatenate. See this issue and specially this comment.

Regarding documentation, I think we could have a tutorial (maybe in the conversion gallery) where it is explained how to to handle the three cases and what to do if the files are scattered across time.

A lot of good information in this thread that I link here for provenance. https://github.com/SpikeInterface/spikeinterface/issues/2620

[EDIT] I am adding a screenshoot of how the saving menu looks like that I think is very useful information to take into account

image

Taking from page 19 of the user guide https://intantech.com/files/Intan_RHX_user_guide.pdf

zm711 commented 5 months ago

Related to this:

All of Neo's code for rhs format's are stuck in version 1.0 when we should update to get to version 3.x (the current). So everything you've listed needs to be doubled for the .rhd AND for the .rhs. :)

h-mayorquin commented 5 months ago

Thanks for jumping in @zm711.

I think that rhs should be sort of straighfroward (fingers crossed!) but we would need some test files for that.

zm711 commented 5 months ago

I can make some ( still haven't set up my gin g-node account so if I share them would you put them on there :)? ) I think we've done most of the work with the rhd it would just be propagating. I might also try to share an intan merged file so we can get to the bottom of why merging with intan causes neo to fail. I'm pretty busy this week doing analysis, but I can share the files with you this weekend.

h-mayorquin commented 5 months ago

OK, send them over when you have a chance. Consdering the neo release schedule and that now we have some resources on improving intan this is a good time to do it.

I still owe you the performance analysis that I will do against the data of the lab that I am working with. This will be important for them and maybe we can add further improvements before the next release.

Thanks again for jumping in!

CodyCBakerPhD commented 5 months ago

still haven't set up my gin g-node account so if I share them would you put them on there

@zm711 I don't know if @h-mayorquin mentioned in passing to you yet, but we've wanted for a while now to have a GitHub LFS copy (or complete migration) of the example datasets instead of GIN - mainly because we get lots of complaints from users and team members that GIN is really hard to contribute to (and even download from) - but no one has had time or energy on our side to make the jump so if you have any interest in taking that on we'd be very grateful 😁

zm711 commented 5 months ago

Happy to at least bring it up at the next meeting because I too dislike GIN workflow and haven't got datalad to work on my personal laptop yet.

h-mayorquin commented 5 months ago

One thing that is not clear and it would be good to discuss with both of you guys @CodyCBakerPhD and @zm711 is the following. How to handle the different signal types?

We have:

]

stream_type_to_name = {
    0: "RHD2000 amplifier channel",
    1: "RHD2000 auxiliary input channel",
    2: "RHD2000 supply voltage channel",
    3: "USB board ADC input channel", 
    4: "USB board digital input channel",
    5: "USB board digital output channel",
}

Plus temperature.

For the first iteration of the interface I intend this to work for amplifier channels as they are sort of straighfroward for NWB. They should go as ElectricalSeries and we can just write them in the standard way.

However, what about the others? Some channels would be for synch and I think @CodyCBakerPhD as an idea on what to do with them: https://github.com/catalystneuro/neuroconv/issues/519 https://github.com/catalystneuro/neuroconv/pull/520

Not sure how to handle the rest.

zm711 commented 5 months ago

So from my experience, 1: auxiliary input are voltages that are typically supplied from an optional accelerometer attached to the headstage. These channels are on by default and the user has to go in to shut them off. They only matter if you have the accelerometer but since they are opt-out most users will have this info, but it will be useless.

2. Not a clue. I've never used nor checked this :)

3. In default set up I believe this up to 2 channels which will be labeled 1, 2, 3, etc. But Intan sells an extender board to get up to I believe 16 of these. Basically these are voltages that vary from -10 to 10 V. They are good for doing something like a mouse steering wheel where you record which angle the steering wheel is at based on the voltage.

4. This one is complicated. It is a single vector that stores all digital in channels (again with the extender there are 16 options). So the question would be do we want to store just the vector or do we want to do the bit shift operation to return all individual channels as 1 and 0s?

5. This is the same as above, but is specifically for RHS where the digital signal is synced up to the analog stim channel. I would set this one aside until we understand the RHS format better.

These are just based on my reading though so happy to debate. I use amplifier + digital-in + adc all the time so I'm use to those the most. My schema would be

amplifier-> DATA ADC + DIGITAL-IN -> Behavior/event tracking signals

CodyCBakerPhD commented 5 months ago

Sounds like 2. is a mystery - I don't know the answer for supply voltage. Instinct is that it's not relevant

From Zachs comments on 1. and 3. they could both be candidates for an analog signal (continuously varying time series whose frames are synchronized with the amp channels since everything is feeding into the same board) - this can either be mapped as a separate non-neural ElectricalSeries in volts with rich description about how it's different from the other raw data - OR, if you know how to convert the volts into the real world scientific unit (example, speed, velocity, acceleration, temperature, so on) then just write it to a TimeSeries of an appropriate name and conversion factors attached to translate the int16 to the float

Digital signals can depend - if just a simple TTL then seems a waste to include entire raw electrical trace - just parse the on/off times and store those in a table or ndx-events. The main reason to store the raw trace for digital channels is provenance related to potential errors in parsing, which has occurred in the past for complicated digital words in the presence of noise in the voltage signal (citing memory of Feldman project from years ago). In this case the digitization scheme was not just on/off but the magnitude of the voltage rise had to be estimated and an inadequate filter was applied that threw off the final mapping onto word encoding (which was trial numbers) and so in that case it might actually have been a good idea to keep the original trace because it was more complicated. But it's also a judgement call

h-mayorquin commented 5 months ago

Thanks guys, this is very useful.

I have some uses cases from the current conversion that can add to the debate. They are for signal 3 (USB board ADC input channel with files named board-ANALOG) and signal 4 (USB board digital input channel with files named board-DIGITAL-IN).

zm711 commented 5 months ago

Your USB board ADC input (3) is exactly what I was thinking for ADC. The thing I've noticed for ADC was the baseline wasn't super steady (for my purpose) so I'm curious if your baseline was super steady.

For board-DIGITAL-IN we should just return a vector. There are two options and I'd be curious how you want to deal with: If you know the dig in channel you can do this

data_of_ones_and_zeros = np.not_equal(np.bitwise_and(raw_digital_data, (1 << value),), 0,)

where you just input value for each digital channel to get the TTL signal then do the rising-edge. If you wanted to be more robust you could just iterate through all values range(0,16) and then the row of the matrix with 1's is the digital channel with info.

So really to @CodyCBakerPhD point there is the raw-raw digital data that is (n_samples, 1) which can be converted to 1 and 0s with the code above into a matrix of (n_samples, n_channels) which can be converted to just TTL style information. So you could store the raw-raw data which is a compact vector and the ttl information. Someone can then recreate the full trace pretty easily.

zm711 commented 5 months ago

image

I think a picture might be helpful. This is what the end user sees when using intan recording equipment. Analog and Digital are connected via BNC. The voltages allowed are below the BNC hookups. The amplifier, supply voltage, aux input are all coming from the headstage connected via spi.

h-mayorquin commented 5 months ago

Side question: Do people ever use the amplifier channels for anything other than electrodes from a headstage? Can they be used?

zm711 commented 5 months ago

As far as I know the amplifier channels need a headstage to be engaged. For example if I turn on intan without a headstage connected it only gives me access to dig-in, analog-in.

There are some set ups for EMG (and I think EEG) with their headstages so they don't have to be MEA data.