Motivation and scope? - Githubissues

ctrueden commented 4 years ago

Is there a document discussing the motivation and scope for this project in detail, compared to other efforts in the community? In particular I'd like to know what this project is seeking to achieve that couldn't be done with SCIFIO and Bio-Formats via PyImageJ, python-bioformats, or similar. I understand the desire for pure Python with no JVM, but I'm looking for more detail than that:

Does this library intend to target all PFFs supported by Bio-Formats?
Will it support full metadata extraction and conversion to the OME data model, like Bio-Formats does?
Will it support translation between metadata models, like SCIFIO does?
Will it be extensible so that additional formats can be added as plugins?
What features does this library provide on top of "plain" Python imageio? Have you started a discussion with the imageio developers about it? Might it make sense to contribute functionality upstream?

Thanks for any and all insight! The more we communicate and work together as a community, the better!

evamaxfield commented 4 years ago

Hey @ctrueden! These are great questions and this is great timing!

We are writing a more verbose road map and supporting documentation but here are some brief answers:

Does this library intend to target all PFFs supported by Bio-Formats? Will it be extensible so that additional formats can be added as plugins?

We do not intend to implement every PFF but the architecture is such that anyone can contribute a Reader class for their favorite PFF.

Will it support full metadata extraction and conversion to the OME data model, like Bio-Formats does? Will it support translation between metadata models, like SCIFIO does?

We consider OME metadata to be the best common standard for now and we write out to OME-TIFF. We do not promise perfect or complete metadata extraction or conversion but we always want to improve. We punt on this to some extent but we do aim to provide a pluggable API for implementers to improve existing or add new proprietary metadata to OME XML converters. See czi-to-ome-xslt and planned aicsimageio metadata module for details of how we are thinking about it.

What features does this library provide on top of "plain" Python imageio? Have you started a discussion with the imageio developers about it? Might it make sense to contribute functionality upstream?

A short list of advantages over base imageio would be:

metadata extraction and utility functions for common metadata
dask for delayed / chunked image reading and manipulation (imageio supports chunked / slice reading but is very particular about it imo)
image writing to OME standard
planned language agnostic metadata conversion module

That said, we haven't been in contact with the imageio team but do communicate with the napari and OME teams fairly regularly.

There have also been a couple of various discussions on Twitter you may be interested in:

Discussing language agnostic metadata extraction and conversion: https://twitter.com/sofroniewn/status/1278001098503028740
Discussing AICSImageIO as a whole: https://twitter.com/openmicroscopy/status/1299327932591153154

Hope all of this helps and I am happy to answer more questions if you have them. I will try to answer all of these as we work on the road map and supporting documentation as well :slightly_smiling_face:

joshmoore commented 4 years ago

The more we communicate and work together as a community, the better!

Since I'm :100: behind that sentiment, I'm going to add a few thoughts to this thread even though they don't really fit under the banner of "documentation" for this repo.

There definitely seems to be a lot swirling around at the moment on these various topics. I'd even add one more twitter thread to @JacksonMaxfield's list that it would be good to discuss in this context (minimally since I find following and engaging with branched twittter threads difficult): https://twitter.com/DrAnneCarpenter/status/1277981620969115649

One of the key benefits of aicsimageio is to help make OME-TIFF a first class citizen in Python, and it's great to have more hands working on that! (cc @sebi06 re: https://github.com/apeer-micro/apeer-ometiff-library)

For bringing Bio-Formats itself more into the Python realm, the performance issue (incl. low overhead) will be something to discuss. Having a similar document for the plans/scope of pyimagej et al. would probably help all of us make decisions on where to invest our effort (It occurs to me that it's probably also best to loop in @imaris re: https://github.com/imaris/imariswriter as well.)

I'd just finish with my strong conviction that whatever we all must do to achieve our immediate objectives, as a community I think we will be most rewarded for our efforts to put the PFF tower-of-babel behind us.

see: https://forum.image.sc/t/ome-s-position-regarding-file-formats/26952

evamaxfield commented 4 years ago

Responding to @joshmoore's statement:

One of the key benefits of aicsimageio is to help make OME-TIFF a first class citizen in Python, and it's great to have more hands working on that!

I can't speak for all the devs on aicsimageio but I think I personally have mentally had the statement, "AICSImage should emulate an OME-Image in pure Python regardless of size, format, or location" in my head while I work on the library.

sebi06 commented 4 years ago

Hi all, very valuable discussion. Some more comments from my side:

To be honest I am almost convinced that the apeer-ometiff-library actually is nor really needed anymore, because there is now aicsimageioo, which basically does the same thing. If you look at the code it already looks quite similar. But I also do not really now, if it makes sense trying to merge it etc. (question of time) or how feasible it is (or if people would even accept it)

libCZI to read CZI images incl aicspylibCZI (by the Allen Institute) is for us the way to go and we are actively working on adding more tools like this. There is nothing finalized yet, but using libCZI inside MATLAB is something we already tested and also making libCZIrw available (so far only used internally) incl Python Bindings is something we might do.

Due to our requirements on the image acquisition and file writing speed etc. we will (most likely) always need CZI for the foreseeable future.

Despite being a big fan of OME-TIFF i really hope that the efforts of everybody will go into the future file formats, where I obviously hope that the CZI will play an import role.

toloudis commented 4 years ago

Here's a little more context from the Allen Institute side.

We absolutely needed a pythonic way of reading the file formats off the microscopes when we were getting our high throughput microscopy up and running in 2016-17. We tried python-bioformats early on but had problems with interop with other parts of our processing, conflicting Java VMs and other issues.

We also have an open science mission and early on agreed that OME-TIFF was the best file format to share our data to the outside world. We have confidence that tiff reading capabillity is really widespread. Currently there's the chicken-egg problem with writing out new open file formats for which not a lot of tools support. Our internal scientists are generally using either imagej, CellProfiler, or custom python processing (that list is not guaranteed to be complete!).

aicsimageio also tries to have an API that makes the 80% case simple. I need to read a file into a numpy array. I need to read channel C at time T into a numpy array. I know my file is too big to fit in memory, so I want to use dask delayed to load the image chunk by chunk. I need to write my numpy data back out to OME-TIFF.

Our metadata story is not complete yet but when we finish the CZI-->OME xml conversion code (currently on a branch in this repo) we will be showing a bioformats-like philosophy where the PFF is converted one-way to the open format. We are definitely interested in a generic python class that can deal with complete OME metadata at a higher api level. This is a key area of collaboration where we think there could be one shared thing that's good for everyone. We've discussed autogenerating one from the OME schema, but would still want to add higher level convenience operations like "remove channel and fixup all the tiffdata entries".

We're also very interested in the future of open and cloud-friendly file formats like Zarr et al. We are still producing ome-tiff for external publication; but as with many people, our data sets, and therefore our storage and processing requirements, are getting bigger fast.

evamaxfield commented 3 years ago

Just pinging this issue to say that our mission, roadmap, and more documentation were just merged into our release-4.0 branch.

You can view our:

A couple of notes to this:

I am noticably absent from the Governance doc steering council. I am no longer at AICS, but, I am still actively contributing to this library as noted by the following --
You can see an early implementation of the "read from any location" part of our mission in my recent, new and improved, DefaultReader PR that extends support for reading from any fsspec supported file system.

I am going to leave this issue open until release-4.0 is merged into master but I am calling this issue "Resolved" for now.

@toloudis @heeler please chime in if you feel the desire to.

AllenCellModeling / aicsimageio

Motivation and scope? #147