AllenCellModeling / aicsimageio

Image Reading, Metadata Conversion, and Image Writing for Microscopy Images in Python
https://allencellmodeling.github.io/aicsimageio
Other
207 stars 51 forks source link

Conversion from CZI to OME TIFF #301

Closed MosGeo closed 1 year ago

MosGeo commented 3 years ago

System and Software

Description

This is my first time trying to save files using aicsimageio and dealing with ome.tiff. I am still getting my head around it. So maybe I am missing something.

When running

from aicsimageio import AICSImage
filename = "./my_czi_file.czi"
img  = AICSImage(filename, reconstruct_mosaic=False)
img.save("my_tiff_file.ome.tiff")

the following error is recieved

  File "C:\Users\Mustafa\Desktop\CZIConvert\main.py", line 6, in <module>
    img.save("my_file.ome.tiff")
  File "C:\Users\Mustafa\Desktop\CZIConvert\.venv\lib\site-packages\aicsimageio\aics_image.py", line 797, in save
    OmeTiffWriter.save(
  File "C:\Users\Mustafa\Desktop\CZIConvert\.venv\lib\site-packages\aicsimageio\writers\ome_tiff_writer.py", line 214, in save
    ome_xml = OmeTiffWriter.build_ome(
  File "C:\Users\Mustafa\Desktop\CZIConvert\.venv\lib\site-packages\aicsimageio\writers\ome_tiff_writer.py", line 590, in build_ome
    ome_dimension_order, is_rgb = OmeTiffWriter._resolve_OME_dimension_order(
  File "C:\Users\Mustafa\Desktop\CZIConvert\.venv\lib\site-packages\aicsimageio\writers\ome_tiff_writer.py", line 328, in _resolve_OME_dimension_order
    raise ValueError(
ValueError: Data array has unexpected number of dimensions: is_rgb = True and shape is (1102, 1, 7, 1, 1200, 1600, 3)

Note that the image is: 1) tiled, 2) multi-cannel, 3) RGB. It is also pyramidal but as I understand it, aicsimageio will just take the 0 level for now.

Expected Behavior

ome.tiff is saved.

Reproduction

See description

Environment

Clean virtual environment with only aicsimageio installed.

evamaxfield commented 3 years ago

We currently do not support saving OME-TIFF with tiles, just the whole reconstructed YX planes.

The OME model itself only "recently" adopted Tiles (and other dims) as supported so it's just a "tooling needs to catch up" problem (See their docs on tiling metadata: https://docs.openmicroscopy.org/ome-files-cpp/0.5.0/ome-model/manual/html/developers/6d-7d-and-8d-storage.html).

I see that you have 1102 tiles of 1200 x 1600 YX which is massive so I can understand why you wouldn't reconstruct_mosaic=True (#274) so for now it seems we may not have a solution for you.... That said. I think under the hood our current implementation of OmeTiffWriter would segfault with this code because it tries to pull in the whole data block at once (#214).

Basically, a lot going on here, all of it is known, but we don't have a solution for yet.

MosGeo commented 3 years ago

Thanks for the info @JacksonMaxfield. Yes, we are dealing with very large datasets. This example actually a small file. I am thinking of converting files to a more open format ome.tiff or even ome.zarr that will simplify their viewing and analysis by different tools in the future (including napari).

Yes, reconstructing the mosaic in aicsimageio right now is practically impossible for these files.

Nicholas-Schaub commented 2 years ago

If you're looking for something that can convert czi to ome tiff, you should check out our bfio utility. It is not as fully featured as AICSImageIO, but we are in the process of creating bindings for our reading and writing tools for AICSImageIO. We built our tools to be lazy loading/writing by default to handle exactly this issue.

One of the examples we provide in our documentation is how to create a scalable tiled tiff converter: https://bfio.readthedocs.io/en/master/Examples/Converter.html#an-efficient-scalable-tiled-tiff-converter

MosGeo commented 2 years ago

@Nicholas-Schaub Thanks for the suggestion. I just did a quick test. Unfortunately, I couldn't convert my file using bfio. The files that I am dealing with are tiled, multi-channel, and RGB (each channel is RGB). So, it is very complicated. The current error for a test image with 30 tiles, 7 (RGB) channels is below:

I'll add that as you see from the below error, bfio did not detect that the RGB channels (it just lumped everything together (7 channels * RGB = 21)

03-Mar-22 13:14:37 - bfio.backends.PythonWriter - WARNING  - The BioWriter only writes single channel images, but the metadata has 21 channels. Setting the number of channels to 1.
03-Mar-22 13:14:37 - bfio.bfio.BioWriter - WARNING  - The BioWriter only writes single image files, but the metadata has 5 images. Setting the number of images to 1.
br.shape: (5373, 5013, 1, 21, 1)
br.dtype: <class 'numpy.uint8'>
Traceback (most recent call last):
  File "C:\Users\Mustafa\Desktop\CziConversion\test.py", line 28, in <module>
    original_image = br[:]
  File "C:\Users\Mustafa\Desktop\CziConversion\.venv\lib\site-packages\bfio\bfio.py", line 176, in __getitem__
    return self.read(**ind)
  File "C:\Users\Mustafa\Desktop\CziConversion\.venv\lib\site-packages\bfio\bfio.py", line 235, in read
    self._backend.read_image([X_tile_start, X_tile_end],
  File "C:\Users\Mustafa\Desktop\CziConversion\.venv\lib\site-packages\bfio\base_classes.py", line 627, in read_image
    self._read_image(*args)
  File "C:\Users\Mustafa\Desktop\CziConversion\.venv\lib\site-packages\bfio\backends.py", line 663, in _read_image
    image = image.reshape(self.frontend.c,y_range,x_range)
ValueError: cannot reshape array of size 50331648 into shape (21,4096,4096)
evamaxfield commented 2 years ago

Honestly, gotta give credit to ya @MosGeo. You really do have a crazy dataset :joy:

I hope one of the various projects can support your data soon.


I'll add that as you see from the below error, bfio did not detect that the RGB channels (it just lumped everything together (7 channels * RGB = 21)

That is actually how OME stores the data. OME doesn't have an RGB dimension.

Nicholas-Schaub commented 2 years ago

Ahhh, yeah that's a bug we are actually in the process of fixing. bfio doesn't handle interleaved channels properly. I'll let you know when that's fixed. Should be within the next week.

We use bfio to convert czi files that are 30,000x50,000x55x1,000x1, so it should be able to handle your data once we get that interleaved channel thing fixed.

Nicholas-Schaub commented 2 years ago

Also, I'm not trying to steal the thunder from AICSImageIO. The only reason I'm here is because we are writing a Reader/Writer for bfio.

evamaxfield commented 2 years ago

Oh I don't mind. I would rather have @MosGeo's issues solved than not :joy:

Excited for a reader tho!

Nicholas-Schaub commented 2 years ago

Just submitted a PR for the tiled tiff reader.

Here is a link for the previously reported issue that we are currently working to resolve. I think this is the same issue @MosGeo ran into.

https://github.com/PolusAI/bfio/issues/10

Nicholas-Schaub commented 2 years ago

Also, for our writer (I should probably open up a separate discussion), but my thought is that you should be able to submit a delayed array. We can then do chunked writing with our writer inside of the Writer.save function.

Nicholas-Schaub commented 2 years ago

@MosGeo , a thought just occurred to me. Can you just do chunked reading of your czi file using img.dask_array and feed the chunks into the BioWriter instead of using a BioReader to read the data? Just remember that our BioWriter only supports single channel images at the moment.

Nicholas-Schaub commented 2 years ago

We just fixed the interleaved channel bug. @MosGeo, would you be able to share an example file so we can test against it? I know we can handle interleaved channels, but you have multichannel interleaved rgb. It should work on it, but I'd like to test if you're willing to share.

MosGeo commented 2 years ago

@Nicholas-Schaub please check your gmail. Thanks! I will test the latest version in a bit.

I am now considering the following for manual conversion:

As you can see, I don't really need the tiles so I can loose them. In fact, the analysis code will be easier without the tiles. I still have not looked into BioWriter but single channel image would be a deal breaker.

Nicholas-Schaub commented 2 years ago

Ah, well if single channel images are a deal breaker then the BioWriter probably will not be for you. We have plans to support multi-channel images, but we keep it as single channel images because of the platform we support.

Note, the current version of the BioReader does not have the fix yet. We are doing some final testing. I did receive your image and will be doing some final testing with it. I'll let you know how it goes.

Nicholas-Schaub commented 2 years ago

@MosGeo I just tested the file you sent me using our latest development version. It looks like the issues are resolved. Let me know how it works. You should be able to use the existing code you have created, you will just need to install the development version of bfio from pypi.

pip install bfio==2.3.0-dev0

Nicholas-Schaub commented 2 years ago

@MosGeo We have released bfio==2.3.0. We just opened a PR for the writer implementation (#396) which should permit semi-scalable writing if you pass in a dask array into the writer. Can you test?

MosGeo commented 2 years ago

@Nicholas-Schaub @JacksonMaxfield

Sorry for the late reply. I seem to only able to pickup this effort intermittently with other work commitments. Thanks for all the work. So here are the results of my tests:

Test image info:

This is the info from Zen Zeiss. So, it is a tiled image, with multiple channels. You will also note that each channel is 24 bit which means that each channel is an RGB. image

Reading test:

You can see the code and output below. Notes:

  1. Both BioReader and AICSImage picked up the actual size of the image which is exactly as reported by the Zen Zeiss. Overall, each is doing what is expected of them. For example, aicslibczi is reporting the tiles (M). So far so good.
    bfio_reader = BioReader(image_filename)
    ai_reader   = AICSImage(image_filename, reader=CziReader)
    czi_reader  = CziFile(image_filename)
    print(bfio_reader._DIMS)
    print(ai_reader.dims)
    print(czi_reader.get_dims_shape())

    The output is

    {'X': 3054, 'Y': 1205, 'Z': 1, 'C': 21, 'T': 1}
    <Dimensions [T: 1, C: 7, Z: 1, Y: 1205, X: 3054, S: 3]>
    [{'A': (0, 3), 'X': (0, 1600), 'Y': (0, 1200), 'C': (0, 7), 'M': (0, 2), 'S': (0, 1)}]

    bfio metadata

    Let's start by looking at bfio_reader.metadata. You will note that it actually picked up meta data for three images. This is because the CZI file actually has two attachments (barcode image, and a macro image of the sample). Cool. I don't think I can retrieve the data for those extra attachments though. bfio_reader.channel_names also reports the 7 channel names correctly and the channel info are correct.

    OME(
    experimenters=[<1 Experimenters>],
    images=[<3 Images>],
    instruments=[<1 Instruments>],
    structured_annotations=[<2810 Structured_Annotations>],
    )

Writing tests (bfioreader->bfiowriter->bfioreader)

Let's see what we can do with writing. Starting with bfio, where we pass the reader metadata as input. Bfio warns that WARNING - The BioWriter only writes single image files, but the metadata has 3 images. Setting the number of images to 1. This means that we will loos the metadata of the other images. This is fine for the time being.

out_filename = Path(r"../output/out.ome.tif")
bfio_writer = BioWriter(out_filename,metadata=bfio_reader.metadata)
bfio_writer[:]= bfio_reader[:]
bfio_writer.close()
bfio_out_reader = bfio.BioReader(out_filename)
print(bfio_reader._DIMS)

Now, reading back the images using a new bfioreader, we are able to retrieve the image. Some notes:

Writing tests (bfioreader->bfiowriter->AICSImage)

Following the same procedure for writing outlined above using bfio and then attempting to read it using AICSImage, there is an error when enquiring about shape which says conflicting sizes for dimension 'C': length 21 on the data but length 7 on coordinate 'C' this is because the metadata house 7 channels (correctly) but the actual channels saved are 21 because each is an RGB image.

Writing tests (AICSImage/CZI -> AICSImage/ome.tiff -> AICSImage/ome.tiff)

Everything is working as expected. It loads the data correctly. Some notes:

Writing tests (AICSImage/CZI -> AICSImage/ome.tiff -> bfiowriter/ome.tiff)

To be continued... need to go now :)

evamaxfield commented 2 years ago

Thanks for the write up @MosGeo !! Seems like we are covering a lot of the bases now thanks to all the contributions from bfio and others :heart:

SeanLeRoy commented 1 year ago

Seems like we are covering a lot of the bases now thanks to all the contributions from bfio and others ❤️

@evamaxfield Does this seem ready to close then (with the potential to be re-opened if uncovered use cases arise)?

evamaxfield commented 1 year ago

I think there are still some issues that need to be addressed. Can we talk about this issue on Monday?

SeanLeRoy commented 1 year ago

Closing this due to inactivity, feel free to re-open with file we can test this with if still desired!