NeuralEnsemble / python-neo

Neo is a package for representing electrophysiology data in Python, together with support for reading a wide range of neurophysiology file formats
http://neo.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
322 stars 247 forks source link

Using NEO with Spikeinterface (se.read_maxwell) to read MaxTwo .h5 files: "Unable to synchronously open object (object 'rec0001' doesn't exist)" #1427

Closed sofiapuvogelvittini closed 7 months ago

sofiapuvogelvittini commented 7 months ago

Describe the bug Hello,

I'm attempting to utilize se.read_maxwell() from Spikeinterface to extract Maxwell data, calling NEO code to retrieve information from the .h5 file. The MaxTwo .h5 file encompasses data from various wells, and with se.read_maxwell(), I can specify the well using stream_id='' and the recording with rec_name=''. Despite indicating stream_id='well006' and rec_name='rec0000', it appears that NEO is disregarding these parameters and attempting to read rec0001, which is absent in my data. Consequently, I encounter the error: "Unable to synchronously open object (object 'rec0001' doesn't exist)."

Attached a screenshot showing the pointerToData inside the .h5 file. This indicates that rec001 does not exist for well 006.

pointerRoData.pdf

To Reproduce

full_raw_rec= se.read_maxwell('/mnt/data/spike_sorting/nikki/prep_data/nikki_data.raw.h5', stream_id='well006', rec_name='rec0000')

Output

KeyError Traceback (most recent call last) Cell In[33], line 1 ----> 1 full_raw_rec= se.read_maxwell('/mnt/data/spike_sorting/nikki/prep_data/nikki_data.raw.h5', 2 stream_id='well006', rec_name='rec0000')

File ~/anaconda3/envs/si_env/lib/python3.11/site-packages/spikeinterface/extractors/neoextractors/maxwell.py:58, in MaxwellRecordingExtractor.init(self, file_path, stream_id, stream_name, block_index, all_annotations, rec_name, install_maxwell_plugin) 55 self.install_maxwell_plugin() 57 neo_kwargs = self.map_to_neo_kwargs(file_path, rec_name) ---> 58 NeoBaseRecordingExtractor.init( 59 self, 60 stream_id=stream_id, 61 stream_name=stream_name, 62 block_index=block_index, 63 all_annotations=all_annotations, 64 **neo_kwargs, 65 ) 67 self.extra_requirements.append("h5py") 69 # well_name is stream_id

File ~/anaconda3/envs/si_env/lib/python3.11/site-packages/spikeinterface/extractors/neoextractors/neobaseextractor.py:187, in NeoBaseRecordingExtractor.init(self, stream_id, stream_name, block_index, all_annotations, use_names_as_ids, neo_kwargs) 158 def init( 159 self, 160 stream_id: Optional[str] = None, (...) 165 neo_kwargs: Dict[str, Any], 166 ) -> None: 167 """ 168 Initialize a NeoBaseRecordingExtractor instance. 169 (...) 184 185 """ --> 187 _NeoBaseExtractor.init(self, block_index, **neo_kwargs) 189 kwargs = dict(all_annotations=all_annotations) 190 if block_index is not None:

File ~/anaconda3/envs/si_env/lib/python3.11/site-packages/spikeinterface/extractors/neoextractors/neobaseextractor.py:27, in _NeoBaseExtractor.init(self, block_index, neo_kwargs) 23 def init(self, block_index, neo_kwargs): 24 25 # Avoids double initiation of the neo reader if it was already done in the init of the child class 26 if not hasattr(self, "neo_reader"): ---> 27 self.neo_reader = self.get_neo_io_reader(self.NeoRawIOClass, **neo_kwargs) 29 if self.neo_reader.block_count() > 1 and block_index is None: 30 raise Exception( 31 "This dataset is multi-block. Spikeinterface can load one block at a time. " 32 "Use 'block_index' to select the block to be loaded." 33 )

File ~/anaconda3/envs/si_env/lib/python3.11/site-packages/spikeinterface/extractors/neoextractors/neobaseextractor.py:66, in _NeoBaseExtractor.get_neo_io_reader(cls, raw_class, neo_kwargs) 64 neoIOclass = getattr(rawio_module, raw_class) 65 neo_reader = neoIOclass(neo_kwargs) ---> 66 neo_reader.parse_header() 68 return neo_reader

File ~/anaconda3/envs/si_env/lib/python3.11/site-packages/neo/rawio/baserawio.py:185, in BaseRawIO.parse_header(self) 172 def parse_header(self): 173 """ 174 This must parse the file header to get all stuff for fast use later on. 175 (...) 183 184 """ --> 185 self._parse_header() 186 self._check_stream_signal_channel_characteristics() 187 self.is_header_parsed = True

File ~/anaconda3/envs/si_env/lib/python3.11/site-packages/neo/rawio/maxwellrawio.py:109, in MaxwellRawIO._parse_header(self) 107 self._channel_slice = ids 108 elif int(version) > 20160704: --> 109 settings = h5["wells"][stream_id][self.rec_name]["settings"] 110 sr = settings["sampling"][0] 111 if "lsb" in settings:

File h5py/_objects.pyx:54, in h5py._objects.with_phil.wrapper()

File h5py/_objects.pyx:55, in h5py._objects.with_phil.wrapper()

File ~/anaconda3/envs/si_env/lib/python3.11/site-packages/h5py/_hl/group.py:357, in Group.getitem(self, name) 355 raise ValueError("Invalid HDF5 object reference") 356 elif isinstance(name, (bytes, str)): --> 357 oid = h5o.open(self.id, self._e(name), lapl=self._lapl) 358 else: 359 raise TypeError("Accessing a group is done with bytes or str, " 360 "not {}".format(type(name)))

File h5py/_objects.pyx:54, in h5py._objects.with_phil.wrapper()

File h5py/_objects.pyx:55, in h5py._objects.with_phil.wrapper()

File h5py/h5o.pyx:189, in h5py.h5o.open()

KeyError: "Unable to synchronously open object (object 'rec0001' doesn't exist)" Expected behaviour If the bug is incorrect behaviour, rather than an unexpected Exception, please give a clear and concise description of what you expected to happen.

Environment:

Additional context Could you give me some advice? Thanks for your help!

zm711 commented 7 months ago

Thanks @sofiapuvogelvittini, There's overlap between the projects :). But we just need to keep the problems separate. I'm not familiar with the maxwell system. I'll cc @alejoe and @samuelgarcia, but they are both super occupied for roughly the next week. We can ping them again soon. The other option would be to share the files and I can try to work on it from my end, but they know this much better than I do, so it would probably be better to wait for them.

sofiapuvogelvittini commented 7 months ago

Dear zm711, Thank you for your prompt response. I would love to share the files so that you, alejoe or samuelgarcia could assist me. Thanks! I have included the data (h.5), the mapping, and the parameters file in the following Gmail link: https://drive.google.com/drive/folders/1In8mopU-QxraccEFxjVmqH_s2FlfDGlY?usp=sharing

The h5 file contains recordings from two wells, but I recommend focusing on well006 as it exhibits better activity.

To give you more context: For generating the mapping file, I used the following code: generate_mapping.zip How_to_use_generate_mapping.txt

I'm uncertain about the reason, but generate_mapping.py seems to recognize well006 as 0000 (probably because this was our first recording). Consequently, I've added well = 0000 in the parameters file.

Thank you very very much. Best regards, Sofia

zm711 commented 7 months ago

Great, I'm a little booked for the next couple days, but I'll take a peek asap and if Alessio and Sam figure out before me, then that's cool too. We will let you know soon Sofia!

alejoe91 commented 7 months ago

Hi @sofiapuvogelvittini

Thanks for sharing the data! I'll take a look at what's going on :)

alejoe91 commented 7 months ago

@sofiapuvogelvittini

It seems that you have different recording names for the different wells.

Did you record different wells at different recordings?

This is currently not supported, mainly because I thought that you can't switch wells betweem recordings. Is that possible during recording?

sofiapuvogelvittini commented 7 months ago

Hi alejoe, thanks for your help! Looking in matlab, I see it the other way around: well006 is rec000 and well014 is rec0001 (attached screnshots) DataStructureh5.pdf

I don't know why the data is named in this way, but actually we did only one recording and selected two wells (well006 and well014). The machine scans and registers the plate by row. Given that these two wells are in different rows, the machine will first scan the row with well006 and then will scan the second row with well 014. Maybe this is the reason why it gave different recording names to each well? However, everything was done in the same assay (or recording).

alejoe91 commented 7 months ago

I see. So are the two recordings simultaneous?

This makes things tricky, because I think that the Maxwell software allows one to perform multiple recordings in sequence (e.g. axon tracking assay) and saves them as separate rec*. This makes the output very ambiguous. Could you check with the Maxwell support team what is the actual "rule"? So we can use that instead of reverse-engineering the file :)

sofiapuvogelvittini commented 7 months ago

I have just double checked the informetion with another h5 generated with MaxTwo using a 24 wells plate. In a same "recording instance", the machine will create a recording per row. Therefore, in a 24 wells plate that contains 4 rows and 6 columns (so 24 wells in total), we will have "4 recordings", but all are generated in one same "recording instance", and therefore , the information will be stored in the same h5 file.

Here you can find the data of a 24 wells plate: https://drive.google.com/drive/folders/1K5w14eePS9WZeWtbJPJemHjMw51BMR98?usp=sharing

For this recording instance, we did not register well000. well001 until well005 associate with rec0000, then well006 until well011 associate with rec0001, then well012 until well017 associate with rec0002 and finally well018 until wel023 associate with rec0003.

Attached screnshot of the data structure: 24_wells_plate_data_structure.pdf

Therefore, how the wells are distributed within the plate is relevant for the rec_name parameter.

alejoe91 commented 7 months ago

Thanks, this is helpful. I'll try to patch this by the end of next week

sofiapuvogelvittini commented 7 months ago

And yes, the recordings are "simultaneous". I mean, all the wells of the plate are registered together in a same "recording instance". Nevertheless, the machine scans each row independently, resulting in the assignment of a distinct rec_name for each row.

sofiapuvogelvittini commented 7 months ago

Hi! Is there any update on this :)? Thanks!

alejoe91 commented 7 months ago

Hey,

I looked into this but something is unclear. In the first recording you shared, you have well006/rec0000 and well014/rec0001, but according to

For this recording instance, we did not register well000. well001 until well005 associate with rec0000, then well006 until well011 associate with rec0001, then well012 until well017 associate with rec0002 and finally well018 until wel023 associate with rec0003.

It should be well006/rec0001 and `well014/rec0002. Which one is correct?

sofiapuvogelvittini commented 7 months ago

Apologies for the oversight, I overlooked some relevant information here. In the initial recording, data were only collected from two wells: well006 (located in the second row) and well014 (in the third row). As the machine can only register one row at a time, it began with the second row (where well006 is situated) and generated rec0000. Subsequently, it proceeded to the third row to register well014, resulting in rec0001. Therefore, the direct mapping of a row to a recording number may not be straightforward and depends on the specific experimental setup.

In the second dataset, we covered at least one well per row. Consequently, in this scenario, the machine traversed all four rows of the plate, resulting in four recordings (rec0000, rec0001, rec0002, and rec0003). Is it possible for NEO to directly extract this information from the h5 file? Alternatively, the user may provide an additional file indicating the association between wells and recordings for the particular experiment.

alejoe91 commented 7 months ago

Thanks, that makes sense. In my opinion, the easiest solution would be to request the user to pass the full "path", e.g. well006/rec0000, etc.

What do you think?

sofiapuvogelvittini commented 7 months ago

yes, that will work! Thanks!

sofiapuvogelvittini commented 7 months ago

Thank you very much, I think it worked well. I am obtaining a different error now, but it seems related to spikeinterface and the reading of the mapping file. I have attached a screenshot... Do you think it could be related to the modification of NEO code or not? Thanks for all your help!

error2.pdf

alejoe91 commented 7 months ago

I see! Yeah that's related to probeinterface..I'll take a look!

alejoe91 commented 7 months ago

@sofiapuvogelvittini there was indeed a bug in my NEO PR, just pushed a fix!

Can you pull, re-install and try again? I was able to load both wells in SpikeItnerface!

sofiapuvogelvittini commented 7 months ago

It is working! Thank you so much!

sofiapuvogelvittini commented 7 months ago

I am sorry... but I found another problem later in the workflow :( ... but again, I am not completely sure if this is related to the change in maxwellrawio.py. I am trying to use si.run_sorter(sorter_name="spykingcircus2"...) with the MaxTwo recording I have just loaded with spikeinterface using the new maxwellrawio.py (rec_name='rec0000', stream_id='well006'). And it seems that right after using maxwellrawio.py, it tries to look into a hdf5 plugin that can not be located, but because it is pointing to an incorrect path. Before, I was able to use si.run_sorter(sorter_name="spykingcircus2"...) with MaxOne data without obtaining this error, and I can see that hdf5 is installed in my conda si_env (which I created using the virtual environment of Spikeinterface). Do you have any thoughts about this ? Thanks a lot

error3.pdf

alejoe91 commented 7 months ago

@sofiapuvogelvittini it works on my side. Can you open an issue on the SpikeInterface end, so we keep them separated?

sofiapuvogelvittini commented 7 months ago

It is working now. I created the directory "....hdf5/plugin", downloaded the Maxwell hdf5 pluging from https://share.mxwbio.com/d/4742248b2e674a85be97/ and located it in "....hdf5/plugin/". Thank you very much for all your help!

sofiapuvogelvittini commented 2 weeks ago

Hello, this was working good, however, for some datasets, I am getting now the following error "KeyError Traceback (most recent call last) Cell In[9], line 1 ----> 1 se.read_maxwell(h5_data,rec_name=rec_si, stream_id=well_si)

File ~/anaconda3/envs/si_env/lib/python3.11/site-packages/spikeinterface/extractors/neoextractors/maxwell.py:62, in MaxwellRecordingExtractor.init(self, file_path, stream_id, stream_name, block_index, all_annotations, rec_name, install_maxwell_plugin, use_names_as_ids) 59 self.install_maxwell_plugin() 61 neo_kwargs = self.map_to_neo_kwargs(file_path, rec_name) ---> 62 NeoBaseRecordingExtractor.init( 63 self, 64 stream_id=stream_id, 65 stream_name=stream_name, 66 block_index=block_index, 67 all_annotations=all_annotations, 68 use_names_as_ids=use_names_as_ids, 69 **neo_kwargs, 70 ) 72 self.extra_requirements.append("h5py") 74 # well_name is stream_id

File ~/anaconda3/envs/si_env/lib/python3.11/site-packages/spikeinterface/extractors/neoextractors/neobaseextractor.py:188, in NeoBaseRecordingExtractor.init(self, stream_id, stream_name, block_index, all_annotations, use_names_as_ids, neo_kwargs) 158 def init( 159 self, 160 stream_id: Optional[str] = None, (...) 165 neo_kwargs: Dict[str, Any], 166 ) -> None: 167 """ 168 Initialize a NeoBaseRecordingExtractor instance. 169 (...) 185 186 """ --> 188 _NeoBaseExtractor.init(self, block_index, **neo_kwargs) 190 kwargs = dict(all_annotations=all_annotations) 191 if block_index is not None:

File ~/anaconda3/envs/si_env/lib/python3.11/site-packages/spikeinterface/extractors/neoextractors/neobaseextractor.py:27, in _NeoBaseExtractor.init(self, block_index, neo_kwargs) 23 def init(self, block_index, neo_kwargs): 24 25 # Avoids double initiation of the neo reader if it was already done in the init of the child class 26 if not hasattr(self, "neo_reader"): ---> 27 self.neo_reader = self.get_neo_io_reader(self.NeoRawIOClass, **neo_kwargs) 29 if self.neo_reader.block_count() > 1 and block_index is None: 30 raise Exception( 31 "This dataset is multi-block. Spikeinterface can load one block at a time. " 32 "Use 'block_index' to select the block to be loaded." 33 )

File ~/anaconda3/envs/si_env/lib/python3.11/site-packages/spikeinterface/extractors/neoextractors/neobaseextractor.py:66, in _NeoBaseExtractor.get_neo_io_reader(cls, raw_class, neo_kwargs) 64 neoIOclass = getattr(rawio_module, raw_class) 65 neo_reader = neoIOclass(neo_kwargs) ---> 66 neo_reader.parse_header() 68 return neo_reader

File ~/anaconda3/envs/si_env/lib/python3.11/site-packages/neo/rawio/baserawio.py:185, in BaseRawIO.parse_header(self) 172 def parse_header(self): 173 """ 174 This must parse the file header to get all stuff for fast use later on. 175 (...) 183 184 """ --> 185 self._parse_header() 186 self._check_stream_signal_channel_characteristics() 187 self.is_header_parsed = True

File ~/anaconda3/envs/si_env/lib/python3.11/site-packages/neo/rawio/maxwellrawio.py:145, in MaxwellRawIO._parse_header(self) 143 gain_uV = 3.3 / (1024 gain) 1e6 144 mapping = settings["mapping"] --> 145 sigs = h5file["wells"][stream_id][self.rec_name]["groups"]["routed"]["raw"] 147 channel_ids = np.array(mapping["channel"]) 148 electrode_ids = np.array(mapping["electrode"])

File h5py/_objects.pyx:54, in h5py._objects.with_phil.wrapper()

File h5py/_objects.pyx:55, in h5py._objects.with_phil.wrapper()

File ~/anaconda3/envs/si_env/lib/python3.11/site-packages/h5py/_hl/group.py:357, in Group.getitem(self, name) 355 raise ValueError("Invalid HDF5 object reference") 356 elif isinstance(name, (bytes, str)): --> 357 oid = h5o.open(self.id, self._e(name), lapl=self._lapl) 358 else: 359 raise TypeError("Accessing a group is done with bytes or str, " 360 "not {}".format(type(name)))

File h5py/_objects.pyx:54, in h5py._objects.with_phil.wrapper()

File h5py/_objects.pyx:55, in h5py._objects.with_phil.wrapper()

File h5py/h5o.pyx:189, in h5py.h5o.open()

KeyError: "Unable to synchronously open object (object 'routed' doesn't exist)""

Not very sure what it means, and is interesting that for othe recordings, same setup, I dont get any error.

here you can find the respective dataset and the jupyter notebook I am using "SC2_all_wells_latest.ipynb": https://drive.google.com/drive/folders/1CoqJTnrn4_qIQEU_rZEc3MuNhaxgHln3?usp=sharing

I've also added another folder, 'data_that_works', which contains a recording of the same plate at a later time point. This one can be read

Could you please help me figure out what may be going wrong here? Thanks a lot!

zm711 commented 2 weeks ago

Two things @sofiapuvogelvittini! It is usually best to open a new issue with everything since this issue is closed we can forget to come back to it. Could you copy your message over to a new issue?

Second that seems to be a h5py problem. Maybe @alejoe91 has seen it before, so once you open the new issue I'll ping Alessio to comment when he has time!

sofiapuvogelvittini commented 2 weeks ago

Oki, I thought it might be related to the previous problem, which is why I included it here. I've now copied it to a new issue. Thanks!