catalystneuro / roiextractors

Python-based module for extracting from, converting between, and handling optical imaging data from several file formats. Inspired by SpikeInterface.
https://roiextractors.readthedocs.io/en/latest/index.html
BSD 3-Clause "New" or "Revised" License
12 stars 7 forks source link

Fix Bruker plane specification implementation through file_paths #345

Open h-mayorquin opened 5 months ago

h-mayorquin commented 5 months ago

[Draft]

I discussed this today with @weiglszonja and we converged to this solution. I am writing the context here for provenance and as a to-do list.

The main problem is a clash between the plane specification and the channel specification that are both passed as stream_name to BrukerTiffSinglePlaneImagingExtractor. For the test examples we have got this works because the channel names are always part of the file name (and therefore of the string implementation) see here:

https://github.com/catalystneuro/roiextractors/blob/29c3491913f4be558a950262fd854a445a6b2621/src/roiextractors/extractors/tiffimagingextractors/brukertiffimagingextractor.py#L167-L175

And this is applied to xml elements that look like this:

      <File channel="2" channelName="Ch2" filename="NCCR32_2022_11_03_IntoTheVoid_t_series-005_Cycle00001_Ch2_000001.ome.tif" />

See that the channelName is part of the file_paths. However, this assumption does not hold for data from the Clandinin lab and the data from this other issue https://github.com/catalystneuro/roiextractors/issues/341 here so we need to decouple.

A first attempt was done here: https://github.com/catalystneuro/roiextractors/pull/343

This works for the BrukerTiffSinglePlaneImagingExtractor because we can look for the file_paths that correspond to a channel name and have very specific list of file and that works. Unfortunately, this breaks the use case of BrukerTiffMultiPlaneImagingExtractor passing plane streams to BrukerTiffSinglePlaneImagingExtractor here:

https://github.com/catalystneuro/roiextractors/blob/29c3491913f4be558a950262fd854a445a6b2621/src/roiextractors/extractors/tiffimagingextractors/brukertiffimagingextractor.py#L239-L242

Because that also relied on the assumption that the plane specification we have is contained in the file name.

So what gives?

The thing that we can do programatically is find all the file_paths that correspond to a channel. The thing that we can do programatically is find a simple and friendly user specifation to get a single plane. Yet.

weiglszonja commented 5 months ago

Yeah we need the number of planes as well, if we have the list of file_paths maybe we can do something similar as in: https://github.com/catalystneuro/roiextractors/blob/29c3491913f4be558a950262fd854a445a6b2621/src/roiextractors/extractors/tiffimagingextractors/brukertiffimagingextractor.py#L158 I think in that example we have two planes: "..._Ch2_000002.ome.tif" and "..._Ch2000001.ome.tif" maybe from the list of file paths we could find the part until the last `''` to determine how many unique files we have and that should be the number of planes? But again this assumption would only hold true when we have volumetric data, as in this example there is a simple plane.

h-mayorquin commented 5 months ago

Yes, to get the number of planes, I think we can count Sequence elements in the xml. What do you think?

For choosing a single plane though we still need to use some sort of regular expression as you were doing.