RGLab / cytolib

c++ library for representing and interacting the gated cytometry data structure
GNU Affero General Public License v3.0
12 stars 11 forks source link

General question about H5 backed gating #54

Closed schienstockd closed 2 years ago

schienstockd commented 2 years ago

Thanks for putting together this framework. We are currently trying to adopt flowWorkspace for image based gating.

Let me know if the following is technically not possible or a stupid idea.

At the moment we are converting an Anndata H5 file to a CytoFrame and then use a GatingSet to create populations.

In theory, would it be possible to use gating directly on the H5 backed Anndata file? I was thinking to adopt the CytoFrame class to make something compatible to working with Anndata.

If that was in theory possible, would you have a suggestion where to start?

mikejiang commented 2 years ago

Not familiar with "image based gating" and "Anndata", so cannot comment too much, would you be able to provide an concrete example data and work flow?

schienstockd commented 2 years ago

We have microscopy images like these and segment them. The idea is to apply flow cytometry gating to gate cells from these images. This is also called 'Histocytometry'. The idea is to link up gating and the images. We currently do that with a combination of R/shiny, plotly and python/napari.

im

I've put the example data into a zip file: gating_example.zip

In this example, there are two cell types (yellow and blue) stained with two separate fluorophores. The properties of the segmented labels are saved in an Anndata file (im.h5ad). There is an R version for this. All properties and the mean intensities of the channels are stored in X of the Anndata object. mean_intensity_0 is blue and mean_intensity_1 is yellow in this example.

library(anndata)

# read adata file
adata <- read_h5ad("im.h5ad")
adata$X

    mean_intensity_0 mean_intensity_1 mean_intensity_2 mean_intensity_3 mean_intensity_4 bbox_min_y bbox_min_x
0           72.32076         9.358491         66.64151      0.000000000        69.188683          0        550
1           77.27631        12.013158         65.34210      0.065789476        74.210526          0        675
2          109.15758        11.278788         34.43636      0.054545455       105.078789          6        655

We convert this file into an FCS file:

library(flowCore)

# convert to FCS file
x <- adata$X

# add metadata
metadata <- list(
  name = dimnames(x)[[2]],
  desc = paste(dimnames(x)[[2]]))

# Create FCS file metadata - ranges, min, and max settings
metadata$minRange <- apply(x, 2, min)
metadata$maxRange <- apply(x, 2, max)

# create flowframe
x.ff <- new(
  "flowFrame",
  exprs = as.matrix(x),
  description = metadata
  )

# save FCS
write.FCS(x.ff, "im.fcs", what = "double")

Then we create a GatingSet with a biexponential transformation:

library(flowWorkspace)
cs <- load_cytoset_from_fcs("im.fcs", alter.names = TRUE)

# transform cytoframe
transFun <- biexponentialTransform()
transList <- transformList(colnames(adata$X), transFun)
csTrans <- transform(cs, transList)

# create gating set
gs <- GatingSet(csTrans)

Then we utilise plotly to draw shapes. The coordinates get passed on to the GatingSet. The following is a simplified example of a rectangle. The coordinates would actually come from plotly:

# simple gate
gateCoords <- list(
  c(4, 5.5), c(2.5, 4)
)

# create matrix
mat <- matrix(
  unlist(gateCoords), ncol = 2,
  dimnames = list(
    c("min", "max"),
    c("mean_intensity_0", "mean_intensity_1")))

# make new rectangle gate
rg1 <- rectangleGate(.gate = mat)
flist <- list(rg1)
names(flist) <- sampleNames(gs)

# add pop to set
gs_pop_add(gs, flist,
           name = "Blue+",
           parent = "root")
recompute(gs)

Then we plot them back onto plotly and show these populations in the image viewer napari.

library(ggcyto)

autoplot(gs, "Blue+", "mean_intensity_0", "mean_intensity_1")

My question is whether we could eliminate the FCS file and directly work on the Anndata file with the GatingSet. We have a framework to add further measurements to this file such as cell interactions and spatial properties. Ideally we would like to also make these available for gating. At the moment we would have to generate a new FCS file and GatingSet?

Thanks for your help!

mikejiang commented 2 years ago

No need to convert FCS, just wrap the matrix data into flowFrame/flowSet, e.g.

> adata <- read_h5ad("../../gating_example/im.h5ad")
> mat = adata$X
> fr = flowFrame(mat)#use flowFrame constructor instead of "new" method
> fr
flowFrame object 'file14fc4a9793fe'
with 184 cells and 26 observables:
                  name              desc     range  minRange  maxRange
$P1   mean_intensity_0  mean_intensity_0       111         0       110
$P2   mean_intensity_1  mean_intensity_1       172         0       171
$P3   mean_intensity_2  mean_intensity_2       131         0       130
$P4   mean_intensity_3  mean_intensity_3        37         0        36
$P5   mean_intensity_4  mean_intensity_4       216         0       215
...                ...               ...       ...       ...       ...
$P22            oblate            oblate         2         0         1
$P23           prolate           prolate         4         0         3
$P24 perimeter_to_area perimeter_to_area        20         0        19
$P25      aspect_ratio      aspect_ratio         3         0         2
$P26              fill              fill         2         0         1
198 keywords are stored in the 'description' slot
> # No need to convert FCS
> fr_list = list(fr)
> sample_names = "im"
> names(fr_list) <- sample_names
> fs = as(fr_list, "flowSet")# convert to flowset
> # Construct gating set directly from flowset
> gs = GatingSet(fs)
> # Data is automatically converted to cytoframe (h5 backend) under the hood
> cs = gs_cyto_data(gs)
> cs
A cytoset with 1 samples.

  column names:
    mean_intensity_0, mean_intensity_1, mean_intensity_2, mean_intensity_3, mean_intensity_4, bbox_min_y, bbox_min_x, bbox_max_y, bbox_max_x, bbox_area, eccentricity, orientation, perimeter, area, convex_area, equivalent_diameter, extent, feret_diameter_max, major_axis_length, minor_axis_length, solidity, oblate, prolate, perimeter_to_area, aspect_ratio, fill

> list.files(cs_get_uri(cs))
[1] "im.h5"
schienstockd commented 2 years ago

Ok makes sense - thanks!