RGLab / openCyto

A package that provides data analysis pipeline for flow cytometry.
GNU Affero General Public License v3.0
77 stars 29 forks source link

Passing quadrantGate objects through openCyto gatingTemplate #199

Closed DillonHammill closed 5 years ago

DillonHammill commented 5 years ago

Hi @mikejiang and @jacobpwagner,

I was hoping that you may be able to help me implement passing quadrantGate objects through the openCyto gatingTemplate.

Previously, I have used a combination of 4 rectangleGate objects when constructing quadrant gates, the issue is that the calculated statistics are off due to the presence of events on the boundaries of the rectangleGates. I see that the quadrantGate class does not have this issue.

I would therefore like to switch over to returning a single quadrantGate object from my gating functions and pass it through the openCyto gatingTemplate. The problem is that a filters object is expected for each of the gated populations (i.e. a gate for each population) but in this instance we have a single gate for 4 populations.

I would like to have a gatingTemplate entry as follows:

add_pop(gs = gs,
              alias = c("A","B","C","D"),
              parent = "T Cells",
              pop = "*",
              dims = c("CD4", "CD8"),
              gating_method = "gate_draw",
              gating_args = *quadrantGate will be passed though here*,
              groupBy = group_by,
              collapseDataForGating = TRUE,
              preprocessing_method = NA)
    )

In this case the alias is supplied in the order of the quadrantGate object (i.e. top left, top right, bottom right, bottom left). Any chance that this could work?

Thanks for your help!

Dillon

mikejiang commented 5 years ago

flowCore::quadGate is already supported by flowWorkspace and thus can be directed added to GatingSet through flowWorkspace::gs_pop_add API (i.e. old add method). To be used in your gate_draw tool within the openCyto context, you just need to return a valid quadGate object

library(flowCore)
library(flowWorkspace)
library(openCyto)
library(ggcyto)
data("GvHD")
gs <- GatingSet(GvHD[1])
dummy_draw_gate <- function(fr, channels, pp_res){
  quadGate(list(`FSC-H` = 500, `SSC-H` = 600))
}
register_plugins(dummy_draw_gate, "dummy")
gs_add_gating_method(gs, alias = "A,B,C,D", pop =  "*", parent = "root", dims = "FSC-H,SSC-H", gating_method = "dummy")
gs_get_pop_paths(gs)
[1] "root" "/A"   "/B"   "/C"   "/D"
autoplot(gs[[1]])

image

That said , GatingSet still stores them as 4 rectangleGates, but they will be open-ended. (The plot may be misleading since ggcyto imputed the Inf vertex according to the plot range for the sake of visualization but all the edge cells are guaranteed to be included)

DillonHammill commented 5 years ago

I must have made a mistake somewhere, I will try again and get back to you. When extracting these populations from the GatingSet are they extracted using rectangleGates? Does this alter the stats?

mikejiang commented 5 years ago

As I said , these are still stored as rectangeGate, but open-ended (which is also what flowJo does in its exported xml/wsp), no need to worry about the cells to be left out

> gh_pop_get_gate(gs[[1]], "A")
Rectangular gate 'A' with dimensions:
  FSC-H: (-Inf,500)
  SSC-H: (600,Inf)
DillonHammill commented 5 years ago

What I meant is that you add up the frequencies they don't exactly equal 100%. For cases where the populations are well separated this is not an issue, but when there are a lot of shared events between rectangles this discrepancy can be quite large. I guess I wanted to know how we deal with events on the edges of the gates (i.e. on the red borders)? I am assuming that a Subset method is used for gating but a split method would be more appropriate?

The total in the above plot is 100.11%

DillonHammill commented 5 years ago

I am starting to track down the gatingTemplate issue, this is the error message that I keep getting:

number of population names (given by 'name' argument) does not agree with the number of filter objects in 'filters'! 
mikejiang commented 5 years ago

Regarding to the cells sitting on the intersection lines, yes, as long as each gate is treated independently in GatingSet, there is currently no way to include them in one gate but exclude from another.

To fix this issue, we will have to store it as a quadGate in GatingSet, which demands some significant changes in cytolib.

The question is : Are these cells(edge) numbers significant enough to call for such change?

DillonHammill commented 5 years ago

I have certainly seen enough cases just in the last week to make the change myself. I noticed that the stats can vary quite a bit between samples - totals range from +/- 1% in minor cases to +/- 2% in more severe cases. You are right flowJo has the same problem because of the way the populations are gated. It is just frustrating because you will have to normalise these values to equal 100% for each sample. It would be much better if a quadrantGate was just used - then the stats will always be correct.

DillonHammill commented 5 years ago

Looks like the gatingTemplate issue is to do with wrapping the quadrantGate in a filters list.

mikejiang commented 5 years ago

Right, typically you have one gate/pop per filter object generated by gating method and filters object are required for returning multi-gates, but quadGate is a special filter object, which will end up producing 4 gates/populations in GatingSet, thus you don't need resort to filters class

DillonHammill commented 5 years ago

Thanks. Do you want me to open a new flowWorkspace issue for the quadGates?

mikejiang commented 5 years ago

It should belong to cytolib

DillonHammill commented 5 years ago

Sure no problem

DillonHammill commented 5 years ago

I will keep everything as quadGates from my end so that everything will work fine when this is updated in cytolib. Thanks again.

mikejiang commented 5 years ago

flowJo has the same problem because of the way the populations are gated

I've checked, quadgate in flowJo has correct counts which means it handles the cross-quadrant cells properly.

(but % is also off by small amount probably due to the rounding error of decimal digits, which has nothing to do with gating).

DillonHammill commented 5 years ago

Make sense - flowJo stats are certainly closer to 100% but are not exact. Most I have seen ~0.2% difference. Similar results obtained using quadGate and the split method.

DillonHammill commented 5 years ago

Just noting that the populations are assigned clockwise (e.g. top left, top right, bottom right, bottom left).

mikejiang commented 5 years ago

Right, that is currently how flowWorkspace:::pop_add.quadGate is interpreting

DillonHammill commented 5 years ago

Just pointing out that the split method for quadGates returns populations in a different order:

q <- quadGate("FSC-A" = 50000, "SSC-A" = 50000)
split(fs[[1]], q)

$`FSC-A+SSC-A+`
flowFrame object 'Activation_1.fcs (FSC-A+SSC-A+)'
with 649 cells and 18 observables:

$`FSC-A-SSC-A+`
flowFrame object 'Activation_1.fcs (FSC-A-SSC-A+)'
with 146 cells and 18 observables:

$`FSC-A+SSC-A-`
flowFrame object 'Activation_1.fcs (FSC-A+SSC-A-)'
with 6012 cells and 18 observables:

$`FSC-A-SSC-A-`
flowFrame object 'Activation_1.fcs (FSC-A-SSC-A-)'
with 43193 cells and 18 observables:

Splitting order is top right, top left, bottom right then bottom left. So the first 2 quadrants are returned in a different order - quadrants 3 and 4 are in the correct order.

mikejiang commented 5 years ago

split method is the legacy API in flowCore, which isn't used in openCyto or flowWorkspace