Closed DillonHammill closed 2 years ago
I have also added an informative error message to the gatingTemplate constructor when the supplied gatingTemplate (CSV or data.table) is empty.
I have also moved the stop.at
break in the gt_gating()
for loop to make sure that the gating process is terminated after the stop.at
node is gated.
See example below. At the moment it is not possible to use stop.at
gate up to (and including) Live Cells
because the stop.at
node itself is not gated. If you try and descend down the tree and set stop.at = "Dendritic Cells"
then the T Cells
are still gated.
However, if we delay the break in the gt_gating()
for loop until after the stop.at
node has been gated we are able to successfully gate up until (and including) the Live Cells
.
Your change is a little intrusive, especially the removal the validity check of dim_count makes me somewhat uncomfortable. I am trying to understand your use case in order to come up a better a solution for you, is this following API call what we are aiming for?
gs_add_gating_method(gs, alias = "*", parent = "root", pop = "*", dims = "", gating_method = "clustering_method_A")
If so, I can simply conditionally do sub-setting on data, that is
parent_data <- parent_data[, channels]
Thanks for looking into this @mikejiang. Yeah unfortunately I couldn't think of a better way of getting around that check, even if you bypass the subsetting for empty dims (mentioned in #229) the gating terminates at that check since the gating_method
is not either polyFunctions
, boolGate
or refGate
. I thought of potentially flipping this on its head and instead check for methods that require 1 or 2 dims but I think this list would be too extensive.
Yeah I aiming for API call similar to the one you mentioned where I pass the dims
through gating_args
instead and pass that to the gating function directly for subsetting prior to clustering. I think my reasoning for this was due to there being a 1 or 2 dim length restriction somewhere.
After thinking about this a bit more, maybe it would be a better idea to instead allow more than 2 dims
then I could just pass them through there directly and the subsetting step no longer becomes an issue. It also means that we can keep that check (that I agree) we should keep. What do you think about this approach? I can fix up my PR to instead do things this way if you think it is a better way. I will take another look tomorrow and have a go at implementing things this way.
No, much of the code base was designed for 1D or 2D gating, thus extending to more than 2D requires significant non-trivial changes, however currently there is special treatment for the use case of alias= and pop = , which is reserved for clustering algorithm that returns a factor or logical vectors representing multiple cell populations. What does your gating function returns exactly?How do you name these populations? Through alias column? Can you give me the exact example gating function specs in terms input and output? so I can work on the solution toward that direction
OK, I have a couple of different gating methods that don't adhere to the 1D/2D gating paradigm.
The first is simply a gating method that takes on a node and returns a logical vector indicating the events to keep in the sampled population (simply a samplibg gating method). In this case, dims
are irrelevant as the gating method just calls nrow
on the flowFrame and computes a sample size. This is where I would like to keep dims empty, otherwise it looks like the node has been sampled based on whatever channel(s) I put in dims (which is not the case).
The second case fits with what you have mentioned in your previous comment. In this case, the gating method takes on a multidimensional flowFrame (i.e. minimum 1 dim and maximum many dims) performs multidimensional clustering and returns a factor with default names for new populations stored in levels
. For supervised clustering where the number of clusters is known, the names can be passed to alias
directly, but for unsupervised clustering where any number of clusters is possible I instead pass default names through the factor levels and set alias=*
. At the moment I pass the dims
to the gating function through gating_args
and would like to leave dims
empty so that the data isn't subsetted to 1 or 2 dims before getting passed to the gating function.
Thanks again for your help.
Hi @mikejiang,
I have taken the time to look through the openCyto code base to better add support for my empty dims use case as described in my previous PR #229. I am busy implementing some new CytoExploreR plugins that require this functionality.
The code is well safeguarded against the empty dims case as it is, but I have strengthened this a bit to address your concerns regarding parsing of complex
pop
(-/+
etc.). As I mentioned previously, for the empty dims case we would never want to use complicatedpop
entries, it simply doesn't make any sense to use anything other thanpop=+
for this case. Below is summary of the changes that I have made:I converted the empty dims error that I added in my last PR (code chunk below) into a warning instead to allow the gating process to proceed whilst at the same time warning users. I doubt that anyone other than me will attempt to do this, but at least now there is an appropriate warning when
dims
have been omitted by accident. https://github.com/RGLab/openCyto/blob/8c438714e9ded55058786e3f0f6a1b0015304941/R/gating-methods.R#L264-L272I have updated
gh_generate_template()
to record empty dims asNA
in the gatingTemplate, the current implementation (below) returns an empty string. I noticed this a while back in CytoExploreR, I think it safer to have empty dims set to NA (consistent with other gatingTemplate parameters), otherwise I have to perform an separate check for empty strings fordims
. I haven't checked it this affects other gating functions yet I will take a closer look tomorrow. Happy to remove this change if you think it is going to cause issues somewhere else. https://github.com/RGLab/openCyto/blob/8c438714e9ded55058786e3f0f6a1b0015304941/R/csvTemplate-parser.R#L26I have updated
.preprocess_row()
to handle the new NA dims case above and I have removed thedim_count
check in the CSV parsing function. Perhaps you will have a better way to keep this check in place. https://github.com/RGLab/openCyto/blob/8c438714e9ded55058786e3f0f6a1b0015304941/R/csvTemplate-parser.R#L290-L295In terms of parsing
pop
, I have put a safeguard in place for thepop=-/+
case to throw an error when attempting to parse complexpop
when dims are empty. I haven't added this to the otherpop
cases as they are covered by.splitTerms
which performs it owndims
length check. https://github.com/DillonHammill/openCyto/blob/476560e2651d57ff8b3a00d82affcdf316017e8b/R/csvTemplate-parser.R#L339-L342Happy to discuss this further and amend as necessary - this is something that I definitely want supported for CytoExploreR. I doubt any
openCyto
users will actually attempt to this themselves and CytoExploreR users don't interact directly with thegatingTemplate
anyway (I take care of that for them). I would just like to have an implementation that I can use even if no-one else uses it.Thanks for your help.
Dillon