Closed mikejiang closed 2 years ago
Thanks so much for taking the time to put these changes together @mikejiang. As always, I really appreciate your help.
I have been testing using empty dims for the sampling gating method, but it only seems to work when the GatingSet contains no other gates - see example code below:
library(openCyto)
library(flowWorkspace)
gs <- GatingSet(GvHD[1])
#create a dummy gating function that mimic sampling method
dummy_func <- function(fr, pp_res, channels, ...)
{
n = nrow(fr)
res = rep(TRUE,n)
return (res)
}
#register it as the plugin gating method
register_plugins(dummy_func, "sample_method")
# Add a gate
gs_pop_add(
gs,
parent = "root",
gate = rectangleGate(
"FSC-H"=c(225,500),
"SSC-H"=c(15, 150),
filterId="test"
)
)
#apply it to the gs
gs_add_gating_method(
gs,
alias = "A",
parent = "root",
pop = "+",
dims = "",
gating_method = "sample_method"
)
If I run all the code below I get an error at the check we have spoken about previously
Error in .preprocess_row(this_row, strict = strict) :
+ has invalid number of dimensions: 0
However, if I don't add the test
gate to the GatingSet prior to gating everything works just fine. I will debug within openCyto
and see if I can help track down the problem.
I suspect I will encounter the same issue when I try to testing clustering gating methods as well.
OK, so the issue arises due to single line data.table
(i.e. no previous gates) converting the empty dims to "NA"
in .preprocess_csv()
:
for(col in colnames(dt))
{
if(col != "collapseDataForGating")
{
newCol <- dt[[col]]
dt[, (col):= as.character(newCol)]
}
}
This is passed to preprocess_row()
where the dim_count
check is performed:
dim_count <- length(strsplit(split = ",", dims)[[1]])
The above returns 1
when dims = "NA"
thus bypassing this check and allowing the code to proceed.
On the other hand, if there are other gates defined in the template, dims
is converted to ""
as we would expect and dim_count = 0
but this causes the dim_count
check to fail due to the restriction on allowed gating methods here:
if (!dim_count %in% c(1, 2)) {
if (!(dim_count == 0 && gm %in% c("polyFunctions", "boolGate", "refGate"))) {
stop(popName, " has invalid number of dimensions: ", dim_count)
}
}
I have pushed the fix try again
Thanks @mikejiang, looks like it the sampling gating method is working now. I will test the clustering gating method today. I will let you know if I encounter any issues.
Once these changes have been made, can I submit a new PR with the other changes that I would like to make (I will close the old PR to save me some time)?
Please review the proposed additional changes below and let me know if there are any that you don't want to make:
dims
in gh_generate_template()
.gatingTemplate()
constructor.stop.at
node in gt_gating()
.I'm not sure about the first one, the rest two looks ok
@mikejiang, could we also update your changes to .preprocess_row()
to cover the NA
dims case as well, since this is the default in gs_add_gating_method()
? At least then if you don't supply any dims
to gs_add_gating_method()
gating can still proceed. The below change to .preprocess_row()
works well in testing:
if(popName == "*" ){
if(alias == "*"|| dims %in% c("", NA))
return(this_row)
else
return(.gen_dummy_ref_gate(this_row))
}
@mikejiang, the first change is more for consistency with the other columns in the gatingTemplate that return NA
for empty entries - dims
seems to be unique in that it returns ""
instead. I can easily work around this on my end, the other changes are far more important.
Hi @mikejiang, one additional change I would like to make is to get the gating to work properly when groupBy=NA
and collapseDataForGating=TRUE
. At the moment, each sample is placed in its own group, when instead all the samples should be collapsed into the same group. I have add to add a dummy variable to pData()
to get around this up until now but I have taken a look at the code and it is just missing an else if
statement (gating-methods.R line 276 onwards):
if (groupBy != "" && isCollapse) {
#when x@collapse == FALSE,then ignore groupBy argument since grouping is only used for collapsed gating
split_by <- as.character(groupBy)
split_by_num <- as.numeric(split_by)
#split by every N samples
if(!is.na(split_by_num)){
nSamples <- length(parent_data)
if(nSamples==1){
split_by <- 1
}else{
split_by <- sample(rep(1:nSamples, each = split_by_num, length.out= nSamples))
}
}else{
#split by study variables
split_by <- strsplit(split_by, ":")[[1]]
split_by <- apply(pData(parent_data)[, split_by, drop = FALSE], 1, paste, collapse = ":")
split_by <- as.character(split_by)
}
} else if(groupBy == "" && isCollapse) { # ADDED THIS PART
split_by <- rep("all", length(parent_data))
} else {
split_by <- sampleNames(parent_data)
}
fslist <- split(parent_data, split_by)
Since the names of fslist
is required to order pp_res
we will need to have a standard name for the group in this case, I have just called it "all"
here to match the behaviour in CytoExploreR - but any name would do.
In testing, this fixes the issue I was having. Please let me know if this change is OK (also what name to give the group containing all samples) and I will include it in my new PR once your changes here have been pushed to master
.
I have also been testing the clustering gating method using factors and it seems to work well. I was just wondering if there is a way to bypass the default naming of the factor levels? I take care of this on my end so I don't need to prefix the factor levels with the name of the gating method. Perhaps I can avoid this by using a multipleFilterResult
instead? I will give this a try today.
Can you confirm this PR works as expected so that I can merge this first?
Sure no problem, I am just testing the FilterResult
method to see if that gets around the default factor level naming. I will report back soon.
Thanks @mikjiang, looks like this PR is now working as expected - please close my original PR and I will prepare a new PR with additional changes I would like to make once you have pushed these changes to master
.
@DillonHammill I included three test cases https://github.com/RGLab/openCyto/blob/support-empty-dims/tests/testthat/test-clustering-gating.R that covers the use case you stated here https://github.com/RGLab/openCyto/pull/234#issuecomment-1023533316 Let me know if this works for you