AllenCellModeling / aicsimageio

Image Reading, Metadata Conversion, and Image Writing for Microscopy Images in Python
https://allencellmodeling.github.io/aicsimageio
Other
207 stars 51 forks source link

Writers streaming as data comes in #386

Closed ianhi closed 1 year ago

ianhi commented 2 years ago

(It's possible this already possible and I didn't realize it!

Use Case

I've been using pymmcore-plus to control microscopes. However, currently it's MDA loop just send out signals to the user rather than writing to a file (and probably should stay that way). I want to use aicsimageio to load my images , so I would also like to be able to use it write them to disk as they come in from the MDA.

So my request (if this isn't already possible) is to have aicsimageio writers capable of writing parts of a dataset and expanding it as more data comes in slowly.

In meantime I've just implemented a simple zarr storage solution: https://github.com/ianhi/pymmcore-MDA-writers. But, I'd prefer if that library just handled the interaction with the MDA loop and was not responsible for knowing how to properly write files to disk.

Solution

Writers, in particular the zarr-writer (https://github.com/AllenCellModeling/aicsimageio/pull/381), allow streaming data into the file.

Then my MDA-writer object could have a method like this

@mmcore.events.frameReady.connect
def write_image(self, img, event):
    # i interpret event metdata to figure out where it belongs
    aics_writer.write(img, ...)

Alternatives

I hold all the data in memory and then use the aicsimageio writer at the end to write it to disk. This isn't great because it's not robust to program crashes.

Additional Context

Other discussion: https://github.com/tlambert03/pymmcore-plus/issues/13 https://github.com/tlambert03/pymmcore-plus/pull/29

toloudis commented 2 years ago

To see if I understand: You would like the zarr writer to have a usage pattern where you can call write_image repeatedly and it will fill different portions of images into the same store. I think that's definitely a pattern we should support. It's a matter of figuring out how to specify that "event" data that tells the writer where to put the data. Would it be enough to specify a Z,C, and T range for starters?
(This would also have possibly complicated interactions with any custom chunking parameters users are allowed to pass in.)

I'm not sure if our underlying writing library supports this type of mode. There will be some nontrivial work to handle ome-zarr metadata to support it as well.

ianhi commented 2 years ago

Yup exactly!

(This would also have possibly complicated interactions with any custom chunking parameters users are allowed to pass in.)

indeed this is part of why I opened this request. centralizing that complication to one place would be really hlepful downstream.

ianhi commented 2 years ago

It's a matter of figuring out how to specify that "event" data that tells the writer where to put the data. Would it be enough to specify a Z,C, and T range for starters?

For sure

tlambert03 commented 2 years ago

this would be dreamy 😄

Nicholas-Schaub commented 2 years ago

I am currently wrapping our bfio library, and the underlying saving feature does this by default for OME zarr and OME tiff files. My plan is to use this functionality to take advantage of a dask delayed object if it gets passed in so that arbitrarily large images should be able to be written to disk. I believe #301 is facing this exact issue in just trying to do image conversions.

I would be happy to be a part of any kind of planning conversations for this feature. I didn't have plans to go above and beyond by having our BioWriter implementation doing this outside of utilizing a dask delayed object, but if it were implemented then it should be trivial to extend our writing wrappers to handle this.

toloudis commented 2 years ago

as an update here, the zarr writer branch now has a code path that can load an image using aicsimageio's get_image_dask_data, and pass it along to the zarr writer, so that all the reads would be delayed and optionally rechunked for zarr output. It's not quite as granular as what was discussed but it's a path forward for large data.

That zarr writer branch currently depends on a forked pull request to ome-zarr-py that adds dask capability.

I still agree that it would be better to be able to make a series of more granular function calls to write pieces of the image.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.