NeuralEnsemble / python-neo

Neo is a package for representing electrophysiology data in Python, together with support for reading a wide range of neurophysiology file formats
http://neo.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
322 stars 247 forks source link

Question-How are streams determined from files? #1507

Closed AbhiSwamiUConn closed 3 months ago

AbhiSwamiUConn commented 3 months ago

For example, I have a plexon file that contains data grouped into 3 catagories. The wideband, the field potential of the wideband, and the spikes from the wideband. When reading the file and accessing a specific stream, the field potential is grouped into one stream and the wideband as well as the spikes are grouped into another stream. How are these streams determined and is it possible to separate the wideband and spikes into separate streams so I am able to only access one or the other? The wideband and spikes both have a sampling frequency of 40,000 hertz while the field potential has a sampling frequency of 10,000 hertz?

zm711 commented 3 months ago

plexon 1 or plexon 2? (plx for pl2 files). The docs explain the general idea here.

But basically if the sampling frequencies are different they have to be different streams. I'm surprised Wideband and spikes are in the same stream. Once you let me know which plexon I'll dig into more :)

AbhiSwamiUConn commented 3 months ago

It is plexon 1. In a different program called neuroexplorer, I can clearly see there are 16 field potential channels, 16 spike channels, and 16 wideband channels. However, when reading the file without any stream arguments to see what their names are, I only see two streams. One contains the field potential and the other seems to contain both the wideband and spikes as when I try to find how many channels are there, it lists 32.

zm711 commented 3 months ago

I see. yeah that might be a little bit of poor organization on our part. It seems like with the current setup you would just have to do a channel slice to get the correct channel for your purposes instead. Could you do the following and put it here.

from neo.rawio import PlexonRawIO

reader = PlexonRawIO(file_name)

reader.parse_header()

print(reader)
print(reader.header)

Once I double check how that displays I could check to see if there is a way to split those into separate streams. I might need a test file though.

AbhiSwamiUConn commented 3 months ago

Thank you. But how would I know which set of channels correspond to spikes and which correspond to the wideband? Do I assume the first 16 are the wideband?

zm711 commented 3 months ago

That's why I'm having you run the print reader and print reader.header. If we aren't labeling the channels right, then it would be impossible, but often we give the channel's "names" which you could use to do the slice. Could you do the script I posted above so we can see if it is possible or whether we need to patch that right away ?

AbhiSwamiUConn commented 3 months ago

This is what I get:

Parsing data blocks: 100%|█████████▉| 2197108832/2197169064 [01:30<00:00, 24268626.82it/s]
Finalizing data blocks:   0%|          | 0/3 [00:00<?, ?it/s]
Finalizing data blocks for type 1: 100%|██████████| 16/16 [00:00<?, ?it/s][A

Finalizing data blocks for type 4:   0%|          | 0/43 [00:00<?, ?it/s]
Finalizing data blocks for type 4: 100%|██████████| 43/43 [00:03<00:00, 11.44it/s]
Finalizing data blocks:  67%|██████▋   | 2/3 [00:03<00:01,  1.88s/it]
Finalizing data blocks for type 5:   0%|          | 0/80 [00:00<?, ?it/s]
Finalizing data blocks for type 5:   1%|▏         | 1/80 [00:03<04:36,  3.50s/it]
Finalizing data blocks for type 5:   2%|▎         | 2/80 [00:09<06:05,  4.69s/it]
Finalizing data blocks for type 5:   4%|▍         | 3/80 [00:14<06:35,  5.13s/it]
Finalizing data blocks for type 5:   5%|▌         | 4/80 [00:20<06:43,  5.31s/it]
Finalizing data blocks for type 5:   6%|▋         | 5/80 [00:25<06:41,  5.35s/it]
Finalizing data blocks for type 5:   8%|▊         | 6/80 [00:31<06:49,  5.54s/it]
Finalizing data blocks for type 5:   9%|▉         | 7/80 [00:37<06:42,  5.51s/it]
Finalizing data blocks for type 5:  10%|█         | 8/80 [00:42<06:41,  5.58s/it]
Finalizing data blocks for type 5:  11%|█▏        | 9/80 [00:48<06:33,  5.54s/it]
Finalizing data blocks for type 5:  12%|█▎        | 10/80 [00:53<06:26,  5.52s/it]
Finalizing data blocks for type 5:  14%|█▍        | 11/80 [00:59<06:30,  5.66s/it]
Finalizing data blocks for type 5:  15%|█▌        | 12/80 [01:05<06:29,  5.73s/it]
Finalizing data blocks for type 5:  16%|█▋        | 13/80 [01:11<06:26,  5.76s/it]
Finalizing data blocks for type 5:  18%|█▊        | 14/80 [01:17<06:26,  5.86s/it]
Finalizing data blocks for type 5:  19%|█▉        | 15/80 [01:22<06:13,  5.75s/it]
Finalizing data blocks for type 5:  20%|██        | 16/80 [01:28<06:04,  5.69s/it]
Finalizing data blocks for type 5:  21%|██▏       | 17/80 [01:33<05:54,  5.62s/it]
Finalizing data blocks for type 5:  22%|██▎       | 18/80 [01:39<05:43,  5.54s/it]
Finalizing data blocks for type 5:  24%|██▍       | 19/80 [01:45<05:55,  5.83s/it]
Finalizing data blocks for type 5:  25%|██▌       | 20/80 [01:51<05:50,  5.85s/it]
Finalizing data blocks for type 5:  26%|██▋       | 21/80 [01:57<05:39,  5.75s/it]
Finalizing data blocks for type 5:  28%|██▊       | 22/80 [02:03<05:40,  5.87s/it]
Finalizing data blocks for type 5:  29%|██▉       | 23/80 [02:09<05:30,  5.79s/it]
Finalizing data blocks for type 5:  30%|███       | 24/80 [02:14<05:19,  5.70s/it]
Finalizing data blocks for type 5:  31%|███▏      | 25/80 [02:20<05:17,  5.78s/it]
Finalizing data blocks for type 5:  32%|███▎      | 26/80 [02:26<05:08,  5.72s/it]
Finalizing data blocks for type 5:  34%|███▍      | 27/80 [02:31<05:02,  5.70s/it]
Finalizing data blocks for type 5:  35%|███▌      | 28/80 [02:37<05:00,  5.77s/it]
Finalizing data blocks for type 5:  36%|███▋      | 29/80 [02:42<04:45,  5.59s/it]
Finalizing data blocks for type 5:  38%|███▊      | 30/80 [02:48<04:48,  5.76s/it]
Finalizing data blocks for type 5:  39%|███▉      | 31/80 [02:54<04:40,  5.73s/it]
Finalizing data blocks for type 5:  40%|████      | 32/80 [03:00<04:37,  5.77s/it]
Finalizing data blocks for type 5:  41%|████▏     | 33/80 [03:02<03:35,  4.59s/it]
Finalizing data blocks for type 5:  42%|████▎     | 34/80 [03:03<02:49,  3.67s/it]
Finalizing data blocks for type 5:  44%|████▍     | 35/80 [03:05<02:19,  3.10s/it]
Finalizing data blocks for type 5:  45%|████▌     | 36/80 [03:07<01:57,  2.67s/it]
Finalizing data blocks for type 5:  46%|████▋     | 37/80 [03:08<01:42,  2.38s/it]
Finalizing data blocks for type 5:  48%|████▊     | 38/80 [03:10<01:30,  2.15s/it]
Finalizing data blocks for type 5:  49%|████▉     | 39/80 [03:12<01:23,  2.04s/it]
Finalizing data blocks for type 5:  50%|█████     | 40/80 [03:13<01:16,  1.91s/it]
Finalizing data blocks for type 5:  51%|█████▏    | 41/80 [03:15<01:09,  1.79s/it]
Finalizing data blocks for type 5:  52%|█████▎    | 42/80 [03:16<01:03,  1.68s/it]
Finalizing data blocks for type 5:  54%|█████▍    | 43/80 [03:18<01:02,  1.68s/it]
Finalizing data blocks for type 5:  55%|█████▌    | 44/80 [03:20<01:02,  1.73s/it]
Finalizing data blocks for type 5:  56%|█████▋    | 45/80 [03:22<00:59,  1.70s/it]
Finalizing data blocks for type 5:  57%|█████▊    | 46/80 [03:23<00:58,  1.71s/it]
Finalizing data blocks for type 5:  59%|█████▉    | 47/80 [03:25<00:57,  1.75s/it]
Finalizing data blocks for type 5: 100%|██████████| 80/80 [03:26<00:00,  2.59s/it]
Finalizing data blocks: 100%|██████████| 3/3 [03:30<00:00, 70.22s/it]
Parsing signal channels: 100%|██████████| 80/80 [00:00<00:00, 2167.98it/s]
Parsing spike channels: 0it [00:00, ?it/s]
Parsing event channels: 100%|██████████| 43/43 [00:00<?, ?it/s]
PlexonRawIO: C:\Users\User\Desktop\Abhinav\example files\vrm2905s13u3875f1.plx
nb_block: 1
nb_segment:  [1]
signal_streams: [Signals 0 (chans: 16), Signals 1 (chans: 32)]
signal_channels: [WB01, WB02, WB03, WB04 ... FP13 , FP14 , FP15 , FP16]
spike_channels: []
event_channels: [EVT01, EVT02, EVT03, EVT04 ... KBD5 , KBD6 , KBD7 , KBD8]

{'nb_block': 1, 'nb_segment': [1], 'signal_streams': array([('Signals 0', '0'), ('Signals 1', '1')],
      dtype=[('name', '<U64'), ('id', '<U64')]), 'signal_channels': array([('WB01', '0', 40000., 'int16', '', 0.00019488, 0., '1'),
       ('WB02', '1', 40000., 'int16', '', 0.00019488, 0., '1'),
       ('WB03', '2', 40000., 'int16', '', 0.00019488, 0., '1'),
       ('WB04', '3', 40000., 'int16', '', 0.00019488, 0., '1'),
       ('WB05', '4', 40000., 'int16', '', 0.00019488, 0., '1'),
       ('WB06', '5', 40000., 'int16', '', 0.00019488, 0., '1'),
       ('WB07', '6', 40000., 'int16', '', 0.00019488, 0., '1'),
       ('WB08', '7', 40000., 'int16', '', 0.00019488, 0., '1'),
       ('WB09', '8', 40000., 'int16', '', 0.00019488, 0., '1'),
       ('WB10', '9', 40000., 'int16', '', 0.00019488, 0., '1'),
       ('WB11', '10', 40000., 'int16', '', 0.00019488, 0., '1'),
       ('WB12', '11', 40000., 'int16', '', 0.00019488, 0., '1'),
       ('WB13', '12', 40000., 'int16', '', 0.00019488, 0., '1'),
       ('WB14', '13', 40000., 'int16', '', 0.00019488, 0., '1'),
       ('WB15', '14', 40000., 'int16', '', 0.00019488, 0., '1'),
       ('WB16', '15', 40000., 'int16', '', 0.00019488, 0., '1'),
       ('SPKC01', '16', 40000., 'int16', '', 0.00019488, 0., '1'),
       ('SPKC02', '17', 40000., 'int16', '', 0.00019488, 0., '1'),
       ('SPKC03', '18', 40000., 'int16', '', 0.00019488, 0., '1'),
       ('SPKC04', '19', 40000., 'int16', '', 0.00019488, 0., '1'),
       ('SPKC05', '20', 40000., 'int16', '', 0.00019488, 0., '1'),
       ('SPKC06', '21', 40000., 'int16', '', 0.00019488, 0., '1'),
       ('SPKC07', '22', 40000., 'int16', '', 0.00019488, 0., '1'),
       ('SPKC08', '23', 40000., 'int16', '', 0.00019488, 0., '1'),
       ('SPKC09', '24', 40000., 'int16', '', 0.00019488, 0., '1'),
       ('SPKC10', '25', 40000., 'int16', '', 0.00019488, 0., '1'),
       ('SPKC11', '26', 40000., 'int16', '', 0.00019488, 0., '1'),
       ('SPKC12', '27', 40000., 'int16', '', 0.00019488, 0., '1'),
       ('SPKC13', '28', 40000., 'int16', '', 0.00019488, 0., '1'),
       ('SPKC14', '29', 40000., 'int16', '', 0.00019488, 0., '1'),
       ('SPKC15', '30', 40000., 'int16', '', 0.00019488, 0., '1'),
       ('SPKC16', '31', 40000., 'int16', '', 0.00019488, 0., '1'),
       ('FP01', '32', 10000., 'int16', '', 0.00019488, 0., '0'),
       ('FP02', '33', 10000., 'int16', '', 0.00019488, 0., '0'),
       ('FP03', '34', 10000., 'int16', '', 0.00019488, 0., '0'),
       ('FP04', '35', 10000., 'int16', '', 0.00019488, 0., '0'),
       ('FP05', '36', 10000., 'int16', '', 0.00019488, 0., '0'),
       ('FP06', '37', 10000., 'int16', '', 0.00019488, 0., '0'),
       ('FP07', '38', 10000., 'int16', '', 0.00019488, 0., '0'),
       ('FP08', '39', 10000., 'int16', '', 0.00019488, 0., '0'),
       ('FP09', '40', 10000., 'int16', '', 0.00019488, 0., '0'),
       ('FP10', '41', 10000., 'int16', '', 0.00019488, 0., '0'),
       ('FP11', '42', 10000., 'int16', '', 0.00019488, 0., '0'),
       ('FP12', '43', 10000., 'int16', '', 0.00019488, 0., '0'),
       ('FP13', '44', 10000., 'int16', '', 0.00019488, 0., '0'),
       ('FP14', '45', 10000., 'int16', '', 0.00019488, 0., '0'),
       ('FP15', '46', 10000., 'int16', '', 0.00019488, 0., '0'),
       ('FP16', '47', 10000., 'int16', '', 0.00019488, 0., '0')],
      dtype=[('name', '<U64'), ('id', '<U64'), ('sampling_rate', '<f8'), ('dtype', '<U16'), ('units', '<U64'), ('gain', '<f8'), ('offset', '<f8'), ('stream_id', '<U64')]), 'spike_channels': array([],
      dtype=[('name', '<U64'), ('id', '<U64'), ('wf_units', '<U64'), ('wf_gain', '<f8'), ('wf_offset', '<f8'), ('wf_left_sweep', '<i8'), ('wf_sampling_rate', '<f8')]), 'event_channels': array([('EVT01', '1', b'event'), ('EVT02', '2', b'event'),
       ('EVT03', '3', b'event'), ('EVT04', '4', b'event'),
       ('EVT05', '5', b'event'), ('EVT06', '6', b'event'),
       ('EVT07', '7', b'event'), ('EVT08', '8', b'event'),
       ('EVT09', '9', b'event'), ('EVT10', '10', b'event'),
       ('EVT11', '11', b'event'), ('EVT12', '12', b'event'),
       ('EVT13', '13', b'event'), ('EVT14', '14', b'event'),
       ('EVT15', '15', b'event'), ('EVT16', '16', b'event'),
       ('EVT17', '17', b'event'), ('EVT18', '18', b'event'),
       ('EVT19', '19', b'event'), ('EVT20', '20', b'event'),
       ('EVT21', '21', b'event'), ('EVT22', '22', b'event'),
       ('EVT23', '23', b'event'), ('EVT24', '24', b'event'),
       ('EVT25', '25', b'event'), ('EVT26', '26', b'event'),
       ('EVT27', '27', b'event'), ('EVT28', '28', b'event'),
       ('EVT29', '29', b'event'), ('EVT30', '30', b'event'),
       ('EVT31', '31', b'event'), ('EVT32', '32', b'event'),
       ('Strobed', '257', b'event'), ('Start', '258', b'event'),
       ('Stop', '259', b'event'), ('KBD1', '101', b'event'),
       ('KBD2', '102', b'event'), ('KBD3', '103', b'event'),
       ('KBD4', '104', b'event'), ('KBD5', '105', b'event'),
       ('KBD6', '106', b'event'), ('KBD7', '107', b'event'),
       ('KBD8', '108', b'event')],
      dtype=[('name', '<U64'), ('id', '<U64'), ('type', 'S5')])}
zm711 commented 3 months ago

You see that your channels are named either WB or SPKC to indicate the wideband or the spikes. So you should be able to just slice based on name. In spikeinterface you can use names as ids for most neo readers. So that is an option or you can see the channel_id or your data is currently organized by its index. So channel_index 0 is WB01.

Does that make sense or would you like a bit more help in working with your file? We can tag some of the people who've worked on plexon and see if there is interest in splitting them.

@h-mayorquin do you guys have any desire for us to work on splitting the wideband and spike channels into distinct streams or do you just always slice them? (Maybe you can tag someone from Neuroconv who has worked with a lot of plexon 1 files)

EDIT: fixed a typo with WB channel number :)

AbhiSwamiUConn commented 3 months ago

Sure, that helps. So I would just select the first 16 of the 32 in the stream? Currently what I am doing is using the ChannelSliceRecording method to select the channels based on their names, which are just listed as numbers when I print them.

Channels: ['0' '1' '2' '3' '4' '5' '6' '7' '8' '9' '10' '11' '12' '13' '14' '15'
 '16' '17' '18' '19' '20' '21' '22' '23' '24' '25' '26' '27' '28' '29'
 '30' '31']
zm711 commented 3 months ago

Spikeinterface defaults to the channel_ids

so above we have ('WB01', '0',), so channel_name 'WB01' is channel_id '0' and channel_index 0. So yes for a ChannelSliceRecording in this case you slice from '0' - '15' to only get the WB channels.

zm711 commented 3 months ago

It might be helpful if we make the channel_ids more meaningful in this case since we are losing WB vs spike information when going from name -> id which could cause mistakes for end users. I'll wait for some neuroconv people to weigh in, but I think if we can guarantee that the names are unique then we could use the names as the ids as well.

h-mayorquin commented 3 months ago

Two things: 1) At spikeinterface level there is an option to change names as ids that should be used for plexon. I think this will make it easier. I will open a PR. 2) @AbhiSwamiUConn I have a question, are you referring to the channels called SPKC as spikes? According to a plexon engineer that contribute here recently says that is just a filtered version of the wideband . That's the reason they are the same stream, they have the same dtype, they come from the same file and they have the same sampling rate.

I think that a stream is a very unfortunate name that has already expanded to other parts of the ecosystem and has lead to a lot of confusion. The thing is that stream is just an implementation detail of neo. It is the part of the format data that cean be thought as a single block and put into a numpy or memmap array. That's why it is defined by having the same dtype, sampling_frequency, shape and coming from the same file. Nothing more.

The problem is that we expose this implementation detail at the user level interface by making users access the data through streams. Users then get confused thinking that stream must mean something like "a type of qualitatively different data" and in some cases it makes sense (neuropixel has a stream for ap and lf bands for example) but in some others it does not.

See here my own confusion in the past: https://github.com/NeuralEnsemble/python-neo/issues/1133/

I don't know if this should be made more prominent in the docs. As I mention it has lead to confusion in others parts of the ecosystem: https://github.com/catalystneuro/roiextractors/issues/245

AbhiSwamiUConn commented 3 months ago

Yes, I am refering to the SPKC which is the filtered wideband. I just thought spikes would be an easier to understand. I have another question. Would it be possible to use the names of the channels to get all the ones named WB for instance?

h-mayorquin commented 3 months ago

Thanks for the clarification: In spikeinterface it should be easy if we merge this PR: https://github.com/SpikeInterface/spikeinterface/pull/3193

zm711 commented 3 months ago

Thanks for clarifying both. Sorry yeah spikes in neo are suppose to be their own thing, but if spkc if just filtered than in Neo parlance it would absolutely be in the same stream.

I agree we should probably find a better explanation of stream for our docs!

h-mayorquin commented 3 months ago

OK, now that https://github.com/SpikeInterface/spikeinterface/pull/3193 is merged on spikeinterface I wrote the asnwer to your question in #3196 @AbhiSwamiUConn

I think that we can close this @zm711 or maybe you want to leave it open until we write better documentation for this? The thing is that I am not sure the stream is ever exposed on neo directly, right?

h-mayorquin commented 3 months ago

Actually, final question @AbhiSwamiUConn. Do you happen to know what are the types of filering that is implemented for the "FP" and "SPKC" channels? It would be good for us to know more about this metadata.

zm711 commented 3 months ago

I think that we can close this @zm711 or maybe you want to leave it open until we write better documentation for this? The thing is that I am not sure the stream is ever exposed on neo directly, right?

They are exposed. here for example. The stream index gives us the sample rate, the gain, and the offset for the data so we can convert it and is necessary for the user functions. So I think we do need to document it. But stream itself is not exposed, just the indices of the streams. Is that your point?

h-mayorquin commented 3 months ago

My question was not well phrased. I guess the core of my question? Do users of neo need to know about streams the same way that users of spikeinterface need to know about streams?

In spikeinterface you need to know the streams to access the data but as I am not a userr of neo I don't know how users interact with the concept of stream and if this type of confusion can arise at the neo level.

AbhiSwamiUConn commented 3 months ago

Actually, final question @AbhiSwamiUConn. Do you happen to know what are the types of filering that is implemented for the "FP" and "SPKC" channels? It would be good for us to know more about this metadata.

It just appears to use a combination of highpass and lowpass filtering.

zm711 commented 3 months ago

Do users of neo need to know about streams the same way that users of spikeinterface need to know about streams?

Yes.

reader = neo.RawIO()
reader.parse_header()

in this example the reader will have the memmap for all streams, but to actually pull out the data you need to specify which stream of data you want. I would actually say that spikeinterface is nicer in that it can be fed with name, whereas all the functions at the rawio level expect the stream index. So you run into the issue that if you don't know the IO it isn't clear how to get what you want

ie how do you get digital data for intan -> you need stream_index = 3. So you basically need to look in the header to see which stream goes where. Again Intan has good names in the header, but it seems like Plexon says streams are Signal 0 and Signal 1 which aren't super helpful.

h-mayorquin commented 3 months ago

Yes, we probably should have better signal streams in plexon but I we will need someone who actually has experimental experience with the format to venture in that direction.

h-mayorquin commented 3 months ago

Actually, final question @AbhiSwamiUConn. Do you happen to know what are the types of filering that is implemented for the "FP" and "SPKC" channels? It would be good for us to know more about this metadata.

It just appears to use a combination of highpass and lowpass filtering.

Thanks, I wonder if the values of the bandpass are fixed or something that people decide at run time.

zm711 commented 3 months ago

I'll close this after I have a documentation issue open for us to better explain stream and I although we have an example explaining channel stuff I think we need to find a way to bring that front and center.

zm711 commented 3 months ago

Closing now that I have a docs issue open.