Arcadia-Science / readlif

Leica Image Format (LIF) file reader for Python
GNU General Public License v3.0
32 stars 13 forks source link

Feature request: read parent/child relation for scene names #28

Closed psobolewskiPhD closed 3 years ago

psobolewskiPhD commented 3 years ago

This may be related to the existing issue https://github.com/nimne/readlif/issues/19 I'm also using LAS X Navigator and multi-well tile scans. These can be continuous—mosaic merges—or not. Originally I posted this to aicsimageio because that's what I was using to get my image data into python, but now I understand it is a better fit here. Here's the original issue: https://github.com/AllenCellModeling/aicsimageio/issues/277 So using Navigator it's easy to do multi-well experiments, but then when importing names don't get read in properly, which makes everything confusing.

Here's a link to a representative LIF: https://www.dropbox.com/s/jk9se8i1kpzqvsn/20210428_24w_L929_Ho_B2C3.lif?dl=0 Just as an example, we can look at the last section: B is row B, then B/n/ are wells in the B row. R1-R5 are actually mosaic merges, but that's not material to this issue. Same with C/ At the end is B/3/R1-R60 which ideally would be able to be imported together.

ReadMyLIFs by @mutterer does this perfectly:

124980983-54787100-e035-11eb-934c-1199d31ccd0e

It's easy to import all the regions together (this is like a mosaic, except scattered in a well). I know that is using bio-formats but perhaps that's possible here, to pass on to aicsimageio?

At the moment aicsimageio ignores the structure of the project, so I get many R1, etc. and can't call the scenes by name as a result (now can use indexes, so it works, but less than ideal, but without the well context, everything is more tricky).

Here's the xml for this LIF big-lif.xml.zip

Here's the start of the last section of images: <Attributes><Attribute>ExtendedMemoryBlock</Attribute><Attribute>_DCSERVER_PREVENT_MEMKILL</Attribute></Attributes><Memory Size="2764800" MemoryBlockID="MemBlock_1050" /><Children /></Element><Element Name="B3 60 10x" Visibility="1" CopyOption="1" UniqueID="a6f972e2-a82c-11eb-bc72-a4bb6dca3902"><Data><Collection ChildTypeTest="AcceptAll"><ChildTypeList /></Collection></Data> <Memory Size="0" MemoryBlockID="MemBlock_1319" /><Children><Element Name="B" Visibility="1" CopyOption="1" UniqueID="a6f972e3-a82c-11eb-bc72-a4bb6dca3902"><Data><Collection ChildTypeTest="AcceptAll"><ChildTypeList /></Collection></Data> <Memory Size="0" MemoryBlockID="MemBlock_1320" /><Children><Element Name="3" Visibility="1" CopyOption="1" UniqueID="a6f972e4-a82c-11eb-bc72-a4bb6dca3902"><Data><Collection ChildTypeTest="AcceptAll"><ChildTypeList /></Collection></Data> <Memory Size="0" MemoryBlockID="MemBlock_1321" /><Children><Element Name="R1" Visibility="1" CopyOption="1" UniqueID="a6f972e5-a82c-11eb-bc72-a4bb6dca3902"><Data><Image TextDescription=""><Attachment Name="TileScanInfo" Application="LAS AF" FlipX="0" FlipY="0" SwapXY="0"><Tile FieldX="0" FieldY="0" PosX="0.0534542067" PosY="0.0256040830" />

You can see <Element Name="B3 60 10x" Visibility="1" CopyOption="1" UniqueID="a6f972e2-a82c-11eb-bc72-a4bb6dca3902"> And then Children:

<Children><Element Name="B" Visibility="1" CopyOption="1" UniqueID="a6f972e3-a82c-11eb-bc72-a4bb6dca3902">
<Children><Element Name="3" Visibility="1" CopyOption="1" UniqueID="a6f972e4-a82c-11eb-bc72-a4bb6dca3902">
<Children><Element Name="R1" Visibility="1" CopyOption="1" UniqueID="a6f972e5-a82c-11eb-bc72-a4bb6dca3902">

I'd love to be able to see those parent names, so something like: "B3 60 10x B3 R1" In an ideal world, because R1 has the "TileScanInfo" bit, the scene would be "B3 60 10x B3" and then all the R1-60 would be part of that as M values—akin to a mosaic, even though they are not touching.

I think readlif is already reading that info? https://github.com/nimne/readlif/blob/581ce8be5a4717e56c44c91b2cec6ea31ee65978/readlif/reader.py#L534 But I'm 100% a python noob and can't quite figure out what's going on. I'm willing to help, hack, or test, but need a bit of guidance. Thanks for making a great package! 😍

nimne commented 3 years ago

Thanks for the detailed report!

I think readlif is already reading that info?

All of the metadata is indeed read (and returnable as a big ElementTree), but not all of the metadata is used. This probably hits on two issues. Some issues with how channels are read identified in #19 (there is a warning for this), and access to additional metadata identified in #18.

Creating an explicit tree-like structure for these files may be a difficult problem, but would it be sufficient to build image 'names' based on the 'path'? For example, set the name to be something like B3/60/10x/B3/R1?

At the end is B/3/R1-R60 which ideally would be able to be imported together.

It should be possible to accomplish this because of some excellent work done by the aicsimageio project to import things in a consistent and easy to use format. A design limitation of this project is that it will only return a single 2D plane at a time - loading up batches of planes into an _n_d image happens using iterators or other wrappers. I'll take a look at the example images you've linked and try to figure out why they are not working correctly as a 'mosaic'.

Fortunately I'll have some time soon to clean up a few open issues here! However, it may be a few weeks!

psobolewskiPhD commented 3 years ago

Creating an explicit tree-like structure for these files may be a difficult problem, but would it be sufficient to build image 'names' based on the 'path'? For example, set the name to be something like B3/60/10x/B3/R1?

This would be certainly fine, it would permit me to more easily import what I need—much like the ReadMyLIFs solution. In an ideal world, all related TileScans (R1-R#) would be imported together. There is no functional differences between 60 touching/overlaping fields and 60 scattered ones. So then the "scene" or "series" divider would be above the Rs. But if I learn some python this may already be possible, as long as I can call or refer to the proper elements (so no dozen orphan R1s in the LIF). Thank you for your time. If I can help, please let me know.

psobolewskiPhD commented 3 years ago

OK, so I was brave and took a look at: readlif/reader.py Again more carefully. It certainly reads the full information for parents/children. The key seems line 648: https://github.com/nimne/readlif/blob/581ce8be5a4717e56c44c91b2cec6ea31ee65978/readlif/reader.py#L648 I changed this to: "name": appended_path Now it's a bit verbose, but it works!

>>> img = AICSImage("../Documents/Leica DMi8/20210428/20210428_24w_L929_Ho_B2C3.lif", reconstruct_mosaic=False, chunk_dims=["S"])
>>> img.scenes
('20210428_24w_L929_Ho_B2C3/Image002', '20210428_24w_L929_Ho_B2C3/Image003', '20210428_24w_L929_Ho_B2C3/Image004', '20210428_24w_L929_Ho_B2C3/Image005', '20210428_24w_L929_Ho_B2C3/Image006', '20210428_24w_L929_Ho_B2C3/Image 1', '20210428_24w_L929_Ho_B2C3/TileScan 2/B/2/R1', '20210428_24w_L929_Ho_B2C3/TileScan 2/B/2/R4', '20210428_24w_L929_Ho_B2C3/TileScan 2/B/2/R5', '20210428_24w_L929_Ho_B2C3/TileScan 2/B/2/R2', '20210428_24w_L929_Ho_B2C3/TileScan 2/B/2/R3', '20210428_24w_L929_Ho_B2C3/TileScan 2/B/2/R1_Merged', '20210428_24w_L929_Ho_B2C3/TileScan 2/B/2/R4_Merged', '20210428_24w_L929_Ho_B2C3/TileScan 2/B/2/R5_Merged', '20210428_24w_L929_Ho_B2C3/TileScan 2/B/2/R2_Merged', '20210428_24w_L929_Ho_B2C3/TileScan 2/B/2/R3_Merged', '20210428_24w_L929_Ho_B2C3/TileScan 3/B/3/R1', '20210428_24w_L929_Ho_B2C3/TileScan 3/B/3/R4', '20210428_24w_L929_Ho_B2C3/TileScan 3/B/3/R5', '20210428_24w_L929_Ho_B2C3/TileScan 3/B/3/R2', '20210428_24w_L929_Ho_B2C3/TileScan 3/B/3/R3', '20210428_24w_L929_Ho_B2C3/TileScan 3/B/3/R1_Merged', '20210428_24w_L929_Ho_B2C3/TileScan 3/B/3/R4_Merged', '20210428_24w_L929_Ho_B2C3/TileScan 3/B/3/R5_Merged', '20210428_24w_L929_Ho_B2C3/TileScan 3/B/3/R2_Merged', '20210428_24w_L929_Ho_B2C3/TileScan 3/B/3/R3_Merged', '20210428_24w_L929_Ho_B2C3/TileScan 4/C/2/R1', '20210428_24w_L929_Ho_B2C3/TileScan 4/C/2/R4', '20210428_24w_L929_Ho_B2C3/TileScan 4/C/2/R5', '20210428_24w_L929_Ho_B2C3/TileScan 4/C/2/R2', '20210428_24w_L929_Ho_B2C3/TileScan 4/C/2/R3', '20210428_24w_L929_Ho_B2C3/TileScan 4/C/2/R1_Merged', '20210428_24w_L929_Ho_B2C3/TileScan 4/C/2/R4_Merged', '20210428_24w_L929_Ho_B2C3/TileScan 4/C/2/R5_Merged', '20210428_24w_L929_Ho_B2C3/TileScan 4/C/2/R2_Merged', '20210428_24w_L929_Ho_B2C3/TileScan 4/C/2/R3_Merged', '20210428_24w_L929_Ho_B2C3/TileScan 5/C/3/R1', '20210428_24w_L929_Ho_B2C3/TileScan 5/C/3/R4', '20210428_24w_L929_Ho_B2C3/TileScan 5/C/3/R5', '20210428_24w_L929_Ho_B2C3/TileScan 5/C/3/R2', '20210428_24w_L929_Ho_B2C3/TileScan 5/C/3/R3', '20210428_24w_L929_Ho_B2C3/TileScan 5/C/3/R1_Merged', '20210428_24w_L929_Ho_B2C3/TileScan 5/C/3/R4_Merged', '20210428_24w_L929_Ho_B2C3/TileScan 5/C/3/R5_Merged', '20210428_24w_L929_Ho_B2C3/TileScan 5/C/3/R2_Merged', '20210428_24w_L929_Ho_B2C3/TileScan 5/C/3/R3_Merged', '20210428_24w_L929_Ho_B2C3/B4_no_dye', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R1', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R2', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R3', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R4', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R5', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R6', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R7', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R8', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R9', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R10', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R11', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R12', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R13', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R14', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R15', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R16', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R17', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R18', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R19', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R20', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R21', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R22', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R23', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R24', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R25', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R26', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R27', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R28', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R29', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R30', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R31', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R32', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R33', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R34', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R35', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R36', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R37', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R38', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R39', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R40', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R41', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R42', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R43', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R44', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R45', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R46', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R47', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R48', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R49', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R50', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R51', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R52', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R53', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R54', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R55', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R56', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R57', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R58', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R59', '20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R60')
>>> img.set_scene("20210428_24w_L929_Ho_B2C3/B3 60 10x/B/3/R51")

I guess the recursive nature of the function means it picks up the full path, rather than just the last bits...But, for the time being it's certainly progress!

psobolewskiPhD commented 3 years ago

Ok, I don't know if it's pretty, but it does seem to work. Editted the function at: https://github.com/nimne/readlif/blob/581ce8be5a4717e56c44c91b2cec6ea31ee65978/readlif/reader.py#L534

   def _recursive_image_find(self, tree, return_list=None, path="", enc_folder2="", enc_folder=""):
        """Creates list of images by parsing the XML header recursively"""

        if return_list is None:
            return_list = []

        children = tree.findall("./Children/Element")
        if len(children) < 1:  # Fix for 'first round'
            children = tree.findall("./Element")
        for item in children:
            folder_name = item.attrib["Name"]

            if path == "":
                enc_folder2 = ""
                enc_folder = ""
                appended_path = folder_name
            else:
                appended_path = path + "/" + folder_name
            has_sub_children = len(item.findall("./Children/Element")) > 0
            is_image = (
                len(item.findall("./Data/Image/ImageDescription/Dimensions")) > 0
            )

            if has_sub_children:
                self._recursive_image_find(item, return_list, appended_path, enc_folder, folder_name)

And then the old https://github.com/nimne/readlif/blob/581ce8be5a4717e56c44c91b2cec6ea31ee65978/readlif/reader.py#L648 (now 651) "name": enc_folder2+ "/" +enc_folder+ "/" +folder_name, Yield:

>>> img = AICSImage("../Documents/Leica DMi8/20210428/20210428_24w_L929_Ho_B2C3.lif", reconstruct_mosaic=False, chunk_dims=["S"])
>>> img.scenes
('/20210428_24w_L929_Ho_B2C3/Image002', '/20210428_24w_L929_Ho_B2C3/Image003', '/20210428_24w_L929_Ho_B2C3/Image004', '/20210428_24w_L929_Ho_B2C3/Image005', '/20210428_24w_L929_Ho_B2C3/Image006', '/20210428_24w_L929_Ho_B2C3/Image 1', 'B/2/R1', 'B/2/R4', 'B/2/R5', 'B/2/R2', 'B/2/R3', 'B/2/R1_Merged', 'B/2/R4_Merged', 'B/2/R5_Merged', 'B/2/R2_Merged', 'B/2/R3_Merged', 'B/3/R1', 'B/3/R4', 'B/3/R5', 'B/3/R2', 'B/3/R3', 'B/3/R1_Merged', 'B/3/R4_Merged', 'B/3/R5_Merged', 'B/3/R2_Merged', 'B/3/R3_Merged', 'C/2/R1', 'C/2/R4', 'C/2/R5', 'C/2/R2', 'C/2/R3', 'C/2/R1_Merged', 'C/2/R4_Merged', 'C/2/R5_Merged', 'C/2/R2_Merged', 'C/2/R3_Merged', 'C/3/R1', 'C/3/R4', 'C/3/R5', 'C/3/R2', 'C/3/R3', 'C/3/R1_Merged', 'C/3/R4_Merged', 'C/3/R5_Merged', 'C/3/R2_Merged', 'C/3/R3_Merged', '/20210428_24w_L929_Ho_B2C3/B4_no_dye', 'B/3/R1', 'B/3/R2', 'B/3/R3', 'B/3/R4', 'B/3/R5', 'B/3/R6', 'B/3/R7', 'B/3/R8', 'B/3/R9', 'B/3/R10', 'B/3/R11', 'B/3/R12', 'B/3/R13', 'B/3/R14', 'B/3/R15', 'B/3/R16', 'B/3/R17', 'B/3/R18', 'B/3/R19', 'B/3/R20', 'B/3/R21', 'B/3/R22', 'B/3/R23', 'B/3/R24', 'B/3/R25', 'B/3/R26', 'B/3/R27', 'B/3/R28', 'B/3/R29', 'B/3/R30', 'B/3/R31', 'B/3/R32', 'B/3/R33', 'B/3/R34', 'B/3/R35', 'B/3/R36', 'B/3/R37', 'B/3/R38', 'B/3/R39', 'B/3/R40', 'B/3/R41', 'B/3/R42', 'B/3/R43', 'B/3/R44', 'B/3/R45', 'B/3/R46', 'B/3/R47', 'B/3/R48', 'B/3/R49', 'B/3/R50', 'B/3/R51', 'B/3/R52', 'B/3/R53', 'B/3/R54', 'B/3/R55', 'B/3/R56', 'B/3/R57', 'B/3/R58', 'B/3/R59', 'B/3/R60')
nimne commented 3 years ago

Hello! I think that changing the name attribute to include the relative path (without the .lif name) makes sense - I've added a fix to the name branch: https://github.com/nimne/readlif/tree/name

Can you let me know if this has the expected behavior?

psobolewskiPhD commented 3 years ago

Thanks for looking into this! Noob question: how can I test that branch?

nimne commented 3 years ago

You'll need to download the branch:

git clone --branch name https://github.com/nimne/readlif

Then you'll need to install it. I would recommend building a virtual environment to test this if you can!

cd readlif
python setup.py install

Alternatively, you can use the changes I've made on line 655 to test!

psobolewskiPhD commented 3 years ago

Got it! At first I thought it didn't work, but I got it installed into my existing env (I'm brave?) and it works perfectly:

 'B3 60 10x/B/3/R56',
 'B3 60 10x/B/3/R57',
 'B3 60 10x/B/3/R58',
 'B3 60 10x/B/3/R59',
 'B3 60 10x/B/3/R60')

That's better than my hack above because it grabs the top level name B3 60 10x which I set in LAS X and then the by-well naming generated by LAS X Navigator! Awesome! Thank you!

nimne commented 3 years ago

Great! That's good news. I'll get this pushed up as a new version in pypi soon!

nimne commented 3 years ago

This fix has been added to the 0.6.3 release that is now up on pypi!

psobolewskiPhD commented 3 years ago

I emoji'd too early? 0.6.3 does not work for me with AICSImageIO to import the LIF from the OP. The branch worked fine, but the new release does not. With a LIF with just complex name scenes, I get an error that the reader is missing. For a LIF with mixed scenes, I just get the ones with simple names (not parent/child). This happens with AICSImageIO 4.0.0, 4.1.0, and 4.2.0 so it's related to the new readlif release—but specific to it, because previously using branch it worked perfectly.

EDIT: I checked, if I go back to 0.6.2, branch 'name', everything works. But if I use 0.6.3, the scenes with parent/child names don't show...

nimne commented 3 years ago

Uhoh! I'll try and see what went wrong.

nimne commented 3 years ago

I definitely broke reading files with folders in 0.6.3 - hopefully that is resolved in 0.6.4 (up now). Thanks for helping me test this out and keep the bugs to a minimum!

I'll certainly need to find some small test files with multiple folders for the testing so it doesn't happen again.

I'll leave this issue open for now.

psobolewskiPhD commented 3 years ago

Resolved on my end! Thanks!