JuliaImages / ImageBase.jl

MIT License
4 stars 2 forks source link

[Feature Request(s)] Pipeline for processing and plotting larger-than-memory groups of images #40

Open kescobo opened 2 months ago

kescobo commented 2 months ago

After extensive discussion with @timholy, @rafaqz, and @felixcremer on slack about a bunch of things I'd like to be able to do, I thought it would be worth writing up as a kind of Jul-I-EP (or julip - JuliaImages enhancement proposal). Many, if not all, of the individual pieces are already possible via Images.jl and related packages, Rasters.jl/DimensionalData.jl, PyramidScheme.jl, etc, but they don't compose well (or at least, it's not clear to me how they compose).

First, I will describe my specific use-case to make it clear where the requests are coming from, and to hopefully give enough context that more experienced folks can identify any x/y problems in my reasoning. Then, I will try to enumerate the specific steps that I would like to be able to take in processing / displaying this group of images. Finally, I will describe what I've currently found that works for each piece, relevant issues or PRs that seem related, and the problems with composition I've encountered.

Use-case

I am working with spatial transcriptomics data, which in the simplest case consists of a series of individual images (hereafter "fields of view" or "FOVs") encoded as uncompressed tiff files and some tables of transcript calls.

Individual FOVs are 4256px square, may be partially overlapping, and are arranged in grids of 4-12 images that represent a functional unit (typically, one tissue section). Many (25-40) of these grids (typically ~200 FOVs total) make up a full slide. There is a table that describes the absolute position of each FOV on the slide. Here's an example layout:

image

Then we have a table (~25 million rows for one slide) of transcript calls. Each of these rows have x/y coordinates relative to the FOV they're measured in, as well as their global position on the slide. Separately, we also have GeoJSON files that delineate polygons representing individual cells.

The thing I'd like to be able to do is to load a grid of images, overlay the cell segmentation and some subset of transcripts, and then be able to select regions of interest. This last part is pretty easy with Makie and MakieDraw (aside from this bug):

14d8fd10-96e9-4fbc-babe-0fa671d21d0e

But when trying to plot the dense images as well, things are getting pretty bogged down, memory-wise (4256 x 4256 x 3 channels x 12 fovs). And the ideal case would be to be able to see a whole slide at once, then dynamically display images + transcripts as we zoom in to certain spots.

Ideal functionality

  1. Data structure that holds a whole slide, with relative positions of images and transcripts
  2. Ability to process raw images in their (partially overlapping) grids (brightness correction / alignment / registration), assign colors to channels, etc
  3. Save to disk, with mapping of pixels to their gobal (slide-level) positions, ideally with a pyramid generated using restrict(), so that different resolutions are available for different levels of zoom.
  4. Plot the full slide, with lazy-loading of correct pyramid structure.

Current state

1. Data structure

This doesn't seem like something that JuliaImages needs to accommodate, except insofar as hooks for part 3 are required. I've already implemented a bunch of this for my specific data:

julia> ex
CosmxExperiment with 2 slides:
    mw_mus_p1_09
    mw_mus_p1_10

julia> sl1 = ex["mw_mus_p1_09"]
CosmxSlide mw_mus_p1_09 with 30 grids and 191 FOVs

julia> gr5 = sl1[5]
CosmxGrid with FOVs:
3×4 Matrix{Int16}:
 40  41  42  43
 44  45  46  47
 48  49  50  51

julia> gr5[46]
CosmxFOV 46 from grid 5 at data/20240909/RawFiles/mw_mus_p1_09/20240905_203432_S1/CellStatsDir/Morphology2D/20240905_203432_S1_C902_P99_N99_F00046.TIF

julia> gr5[2,3]
CosmxFOV 46 from grid 5 at data/20240909/RawFiles/mw_mus_p1_09/20240905_203432_S1/CellStatsDir/Morphology2D/20240905_203432_S1_C902_P99_N99_F00046.TIF

2. Ability to process raw images

This functionality is already present throughout the JuliaImages ecosystem. We can even use BlockArrays.jl to deal with overlapping images (though this doesn't really seem to work for the full NN5 array, I can do it for 1 layer at a time), see https://github.com/JuliaArrays/BlockArrays.jl/issues/414

3. Save to disk

Here's where things get dicey. Obviously, JuliaImages can save images to disk, but mapping them to global coordinates is not obvious. I looked into using Zarr.jl, which I think enables creating a 100k100k5 grid, which I could then fill in image-by-image. I think PyramidScheme.jl can then calculate pyramids, though I haven't been able to get this to work with multiple layers. I think this is starting to get worked on with https://github.com/JuliaDataCubes/PyramidScheme.jl/pull/42

I would be nicer to be able to use actual image files (eg tiff) directly since there are other applications that need to work from image data (eg segmentation). So duplicating Data in a Zarr and as Tiffs is currently required. Also, at present, PyramidScheme doesn't use restrict(), which @timholy has mentioned is important because it does anti-aliasing.

4. Plotting with different resolutions depending on zoom level

This is currently supported by PyramidScheme.jl really well, but there are several limitations, mostly stemming from the fact that it doesn't work directly from an Images-style array. I can use DimensionalArrays / Rasters for this (using the PR linked above), but the plotting functionality of PyramidScheme doesn't currently support 3-channel images (let alone stuff like MultiChannelColors.jl).

Current work-around

What I'm currently doing is the following:

Other pieces

asinghvi17 commented 3 days ago

Is the pixel grid shared among all FOVs, or are they different? I'm wondering whether this is purely an indexing problem or if it needs some kind of resampling / interpolation.