AllenCellModeling / aicsimageio

Image Reading, Metadata Conversion, and Image Writing for Microscopy Images in Python
https://allencellmodeling.github.io/aicsimageio
Other
201 stars 51 forks source link

Support reading mosaic LIF #174

Closed oeway closed 3 years ago

oeway commented 3 years ago

System and Software

Description

Failed to read a lif file with multiple Fov, it fails at this line with the error: ValueError: cannot reshape array of size 104857600 into shape (2048,2048).

It appears the it failed to extract the number of FoV which should be 25 (i.e. 2048x2048x25 = 104857600)

I added a line before where it fails to print out some extra information:

print('===debug===>', typed_array.shape, x_size, y_size, selected_ranges)

And here is the output:

===debug===> (104857600,) 2048 2048 {'S': range(0, 40), 'T': range(0, 1), 'C': range(0, 4), 'Z': range(0, 10)}

Expected Behavior

The sample image should have 40 wells, 25 FoV per well, 10 z-slices per Fov, and 4 channel.

Reproduction

A minimal example that exhibits the behavior.

Unfortunately, we cannot provide the sample file. But if you have any idea about the cause of this bug, happy to test it.

Thank you.

evamaxfield commented 3 years ago

Hey @oeway

Sorry to hear you are having some difficulty, hopefully I can help resolve.

Could you post a bit more of the script that fails to run? I don't know what typed_array is for example.

Are the 40 wells saved as scenes in the file? Or more specifically, what dimension shape are you expecting in return.

S: x, T: x, C: x, Z: x, Y: x, X: x. From my parsing it would appear as if the whole "25 FOV per well" is missing. Which I would assume is like a mosiac type of dimension? How are you saving those? The rest of the dimensions match up (S: 40 (wells), T: 0, C: 4, Z: 10, Y: 2048, X:2048).

Without any sample file it may be a bit hard to debug, any chance you could make or request a sample file that has Scenes and however you are saving the "FOV per Well" dimension but with smaller dimensions (i.e. 2 wells, 3 FOV per well, 2 channels, etc)? I only ask because most of the institute is no on break for the holiday so would be hard for me to request a test file right now.

Current thinking on my part is that its failing because of this unknown dimension (the FOVs per Well) or something is going wrong with the dask array constructor (this file must be massive so I assume you are using dask)

oeway commented 3 years ago

Hi, thanks for looking into this!

It fails when I call imread, I ended up debugging directly in your python module. The typed_array is here: https://github.com/AllenCellModeling/aicsimageio/blob/3ef373312a5ee21bed600837d3dd8f1c29c4be6c/aicsimageio/readers/lif_reader.py#L407 I added the print line before it does reshape(which fails).

Unfortunately, I do not know which dimension should the 25 fov belong to, I got a file and the one who made file told me that it should contain 40 wells, each 25 Fov. Not sure which dimension the software saved to. I will ask tomorrow see if we have a chance to get t small test file.

No, I am not using dask, it was a quick test, I simply import the library and call imread to read the file. The file size is around 300GB.

evamaxfield commented 3 years ago

Ahhh my mistake! Haven't looked at the LifReader code in a while. Nice to know that is exactly coming from our library.

Can you try using the imread_dask function? If the serialized file is 300GB I wouldn't be surprised if the in-memory footprint is upwards of 1TB of memory and hard to imagine imread (or your machine) would like that. I am basically saying with such a large file I would argue that it is better to read the exact chunk you want instead of the whole file.

oeway commented 3 years ago

Ok, thanks for the tip and I will try that! But it should not resolve this issue, correct?

evamaxfield commented 3 years ago

I don't believe it will fully resolve the issue, it may even still fail, but I think you may have a higher chance to get back a dask array with the S, T, C, Z, Y, X dimensions (with one of the 25 FOVs selected don't know how or which one).

I will keep looking into what could be going wrong but without knowing how the dimension is being stored it will be a bit difficult so please do let me know when you hear back.

oeway commented 3 years ago

I can send a file header with the first X bytes, do you think that will be helpful for you?

evamaxfield commented 3 years ago

Could you provide the metadata as an XML file?

from aicsimageio import AICSImage
import xml.etree.ElementTree as ET

img = AICSImage("your-file.lif")
with open("big-lif.xml", "wb") as f:
    f.write(ET.tostring(img.metadata, "utf-8", xml_declaration=True))
oeway commented 3 years ago

Here you go: big-lif.xml.zip

I had to change the code a bit:

from aicsimageio import AICSImage
import xml.etree.ElementTree as ET

img = AICSImage("./myfile.lif")
with open("big-lif.xml", "wb") as f:
    f.write(ET.tostring(img.metadata, encoding='utf8', method='xml'))
oeway commented 3 years ago

There is a dimension 10 with 25 elements:

<Dimensions>
   <DimensionDescription BitInc="0" BytesInc="2" DimID="1" Length="6.652750e-004" NumberOfElements="2048" Origin="0.000000e+000" Unit="m" />
   <DimensionDescription BitInc="0" BytesInc="4096" DimID="2" Length="6.652750e-004" NumberOfElements="2048" Origin="0.000000e+000" Unit="m" />
   <DimensionDescription BitInc="0" BytesInc="8388608" DimID="3" Length="1.349253e-005" NumberOfElements="10" Origin="0.000000e+000" Unit="m" />
   <DimensionDescription BitInc="0" BytesInc="335544320" DimID="10" Length="2.400000e+001" NumberOfElements="25" Origin="0.000000e+000" Unit="" />
</Dimensions>
evamaxfield commented 3 years ago

Yep! I see that as well.

I also see exactly 1000 Tile Elements in the whole document. (25 FOV per Well * 40 Wells = 1000)

This to me says they are being saved as a mosaic dimension. Will look into how to read them!

evamaxfield commented 3 years ago

Looks like outside of python-bioformats (which has a Java backing), no one (or at least no known-to-me Python library) supports reading mosiac / tiled LIF images in pure Python. We (aicsimageio) wrap the readlif library which has an open issue on this exact topic.

So unfortunately, the option right now is to convert the file to some other file type using bioformats or use python-bioformats to read the file it would seem. I will add a comment to the open readlif issue on the topic asking if there is a progress update but sorry to say I am not sure how much progress we can get in a short amount of time esp. over a holiday break.

evamaxfield commented 3 years ago

In the meantime, I am curious if you were able to get imread_dask working at all. Just curious to see the interaction / how it reacts to a mosaic file.

oeway commented 3 years ago

In the meantime, I am curious if you were able to get imread_dask working at all. Just curious to see the interaction / how it reacts to a mosaic file.

Nope, the exact same error occurs.

toloudis commented 3 years ago

Maybe we can just catch this case early and do our own warning if there's a way to make it better than this exception. At least we might be able to give a message like "this is an unsupported LIF image due to ____"

evamaxfield commented 3 years ago

@toloudis, the author of readlif is likely going to release a patch to do just that. Better to handle it upstream.

evamaxfield commented 3 years ago

Correction, already released a new version (0.3.1) to do that

toloudis commented 3 years ago

Correction, already released a new version (0.3.1) to do that

Perfect!

evamaxfield commented 3 years ago

Hey @oeway I am planning to handle support for mosiac LIFs in our 4.x release. Which may be a couple of months out unfortunately. I just don't have the free time to support them for this release. Will ping this thread when we release 4.x.

evamaxfield commented 3 years ago

Hey @oeway This has just been resolved in the 4.x branches. I will get out a dev release later today that you can test out. 4.0 full release is (hopefully) about 2 ish weeks out if you would rather wait. But the API for this functionality is as follows:

from aicsimageio import AICSImage

img = AICSImage("mosaic-file.lif")  # already has the fully stitched LIF image
img.dims  # returns the normal dims of "TCZYX" where YX have been stitched together already
img.reader  # returns the base reader where there is an "M" dimension in addition to the other dimensions where "M" is each individual tile