Closed ziw-liu closed 1 year ago
I drafted the README. We should discuss the scope with the goal to reach an MVP and start to use the library in other repos. The library will evolve based on how it is used.
One question about this item from the README
to ship data to deconvolution and DL pipelines.
Would a deep learning dataset class (i.e. a torch
Dataset
class) be part of this package?
While a think it would be great to have shared deep-learning utilities I'm concerned about bloating this package's dependencies.
to ship data to deconvolution and DL pipelines.
Would a deep learning dataset class (i.e. a torch Dataset class) be part of this package? While a think it would be great to have shared deep-learning utilities I'm concerned about bloating this package's dependencies.
I agree that this package should not provide a Dataset class. I was thinking of efficient thin wrappers around the data tree written by the library to enable succinct usage like this:
import iohub
import waveorder as wo
import dexp as dx
from microDL import trainer
# initialize reader
LF, LS = iohub.reader(<path to zarr store or TIFF directory>, format = 'mantis')
reconstructions = iohub.writer(<path to reconstructions>)
# TCZYX order. LF, LS stores are acquired in different coordinate systems and stored within subfolders.
# reconstruct each time point and write.
for t in range(LF.shape[0])
phase, retardance = wo.reconstruct(LF[t,::]) # 4 channels * XYZ
nuclei = dx.deconvolve(dx.deskew(LS[t,0,::])) # 1 channel * XYZ
membrane = dx.deconvolve(dx.deskew(LS[t,1,::])) # 1 channel * XYZ
reconstructions[t,0,::] = phase
reconstructions[t,1,::] = retardance
reconstructions[t,2,::] = nuclei
reconstructions[t,3,::] = membrane
# train a model
trainer( input = reconstructions[t,0,::] , target = reconstructions[t,2,::], <config parameters or file>)
# reconstructions appear as zarr or dask objects to the DL pipeline.
@ziw-liu , @JoOkuma , @talonchandler I recommend reading this preprint. This discussion related to chunked TIFF format is particularly relevant to the problem of passing data to analysis pipelines such as CellPose:
"Some success has been achieved with OME-TIFF, a 2D multi-resolution image format that captures acquisition metadata as OME-XML in the TIFF header 2,7,8. Reference software implementations are available in Java (https://github.com/ome/bioformats/), C++ (https://gitlab.com/codelibre/ome/ome-files-cpp) and Python (e.g., https://github.com/AllenCellModeling/aicsimageio, https://github.com/apeer-micro/apeer-ometiff-library, https://github.com/cgohlke/tifffile). OME-TIFF is supported by several commercial imaging companies (see https://www.openmicroscopy.org/commercial-partners/) and is the recommended format for public data projects like Image Data Resource (IDR) or Allen Institute of Cell Science, making their data available from https://open.quiltdata.com/b/allencell/.
As our and others’ use of existing tools for conversion to OME-TIFF grew, TIFF’s linear binary layout became a bottleneck. Larger files took increasingly long to write. This problem was most obvious in projects that required the conversion of large numbers of whole slide images from PFFs to OME-TIFF for use in data lakes that are used for AI training sets (https://pathlake.org/; https://icaird.com/). The need for a scalable conversion motivated our development of two tools, bioformats2raw (https://github.com/glencoesoftware/bioformats2raw) and raw2ometiff (https://github.com/glencoesoftware/raw2ometiff). Together they provide a parallel pipeline using Bio-Formats to convert any supported PFF into multi-resolution OME-TIFF. This is achieved by breaking images into atomic “chunks”, writing them independently to disk, and generating subresolutions from them when none are available, whereupon a second process can efficiently write these chunks into TIFF (Figure 1b)."
If we need to convert existing data to TIFF, we can write scripts that use some of the above tools and share them via iohub
.
When the user wants to write data into TIFF, we can rely on tifffile.
I feel that we now have a clear path towards these goals. Closing in favor of specific issues.
After migrating the
io
module ofwaveorder
(at https://github.com/mehta-lab/waveorder/commit/5f60f0ad27e05596a6f6cb09ef2310f6bc00f236) to this new repository, the next steps may include:iohub.Dataset
class that offer an array-like interface), and make sure the existing feature set works under it. #31 #40 #132Where each of these can be elaborated/debated upon in spin-off issues
@mattersoflight @JoOkuma @royerloic @talonchandler @Christianfoley please feel free to add to or modify these objectives.