Mosaic image dimension missmatch

jo-mueller commented 3 years ago

System and Software

aicsimageio Version: 4.0.5
Python Version: 3.8.10
Operating System: Windows 10

Description

Using aicsimage.AICSImage to read a tiled czi file gives incorrect image dimensions in X and Y as compared to other image readers.

Reproduction

I use AICSImage to read a czi image in the following way:

img = AICSImage(path)
img.shape
Out[51]: (1, 3, 1, 12223, 11349)

If I use, for instance aicspylibczi to read the image with the following lines, I get:

czi = aicspylibczi.CziFile(path)
C0 = czi.read_mosaic(C=0, scale_factor=1)
C0.shape
Out[54]: (1, 12231, 11367)

You can download the example file here (PW: ExampleData).

Expected behavior

Opening the same image in Fiji shows that the correct image shape is indeed (12231, 11367) (nevermind the channel dimension here). I assume that AICSImage chunks up the image data in some optimal fashion (12223, 11349) = (3 x chunksizeX, 3 x chunksizeY) but I find the behavior of excess pixels being ignored problematic.

Any help is greatly appreciated!

Note: The reason why the image dimensions are so odd is that the images are usually acquired tilewise and stitched with the image acquisition software directly at the microscope.

evamaxfield commented 3 years ago

Hey @jo-mueller thanks for the report!

We have seen this behavior as well. And:

I assume that AICSImage chunks up the image data in some optimal fashion

This is generally the answer. But the longer answer here is that aicspylibczi.CziFile.read_mosaic (which is the backing library for aicsimageio CZI reading support) uses libCZI's own mosaic reader which reads the full mosaic into memory once it is called (no chunking or anything).

We went a different route for aicsimageio's implementation which was to make the massive mosaic array chunked with dask by default so we had to stitch the array ourselves but we are accurate to the metadata at the very least. We simply use the mosiac tile bounding boxes found in the metadata and place the tiles in the correct spot of the larger delayed dask array. (Although we do need to speed up our stitching process, see #274)

That said, I don't believe FIJI (I assume you are using the BioFormats importer) is using libCZI and so I am curious how they are stitching to get the exact dimensions.

In short, aicsimageio mosaic stitching is a tiny bit different because we specifically wanted to be able to read and process very large mosaics with dask arrays. aicspylibczi uses libCZI mosaic stitcher directly to read into memory. Other readers have other stitching patterns but generally can't read to a delayed dask array.

I hope some of this helps. If you find out anything more about BioFormats CZI tile stitching or if you dig through our alg and see any issues please let us know or make a PR. I will try to get to this soon but likely will be a while.

jo-mueller commented 3 years ago

Hi @JacksonMaxfield ,

Thanks for the reply! I agree that it's a very reasonable approach to choose daskability (is that a word?) over exact pixel dimensions. The difference in dimensions in my case is also just a few pixels.

That being said, how are the dask chunks located with respect to the original array dimensions? I.e. is the (0,0) coordinate preserved?

I guess an alternative approach would be to allocate dummy dask chunks so that the complete array would outsize the actual array. Accessing pixels beyond the defined image dimensions would then have to throw an index error.

Not sure if this is feasible - just brainstorming.

toloudis commented 3 years ago

I'd say that if AICSImage.shape returns (12223, 11349) but aicspylibczi (using libczi to reconstruct) gives (12231, 11367), and the latter agrees with bioformats, then we might have some problem with our reconstruction in AICSImage.

There's probably a fix to be found by looking at libCZI and bioformats and comparing with the AICSImage reconstruction. Maybe we are missing some bit of metadata that would help, or maybe there's just an off by one error or something. Either way, a good unit test would be to compare the dims from the two cases above, AICSImage.shape and the shape coming from the aicspylibczi read_mosaic, using read_mosaic as ground truth.

@JacksonMaxfield is there anything about the dask tile setup that would prevent overlapping tiles or disjoint tiles (tiles with empty pixel space between them)?

BrianWhitneyAI commented 1 year ago

We have added a new maintenance feature to clean up stale issues and PRs. Adding this comment to set a baseline for ‘Stale’

SeanLeRoy commented 1 year ago

@jo-mueller Hi! We are ready to take a look into this issue, but it looks like the download link has expired. If this is still something you would like investigated, please submit another link.

BrianWhitneyAI commented 1 year ago

I spent some time investigating this and could not replicate the minute pixel difference in any of my local examples. I did see that multiscene tiled images can produce different values for shape when using aicsimage vs aicspylibczi where aicsimage splits up these dimensions into scenes and aicspylibczi reads the whole image. It is possible that this may be where this discrepancy shows up, however without the example image it is hard to say. I am closing this for now. feel free to reopen it with a new test image link.

AllenCellModeling / aicsimageio