Closed DillonHammill closed 6 years ago
Randomly grouping samples were intended behavior as far as I see based on the code written (not by me, btw). If you want consistency, just set seed. If you want to have the grouping to be meaning in your data context, then assign the group column as study variable to pData properly and group by that.
I can't see why that random splitting behavior would be desired. We should at least document it: @jacobpwagner
I'll definitely flesh out the doc, but should we change over the default behavior to what most people would expect? The inline comment even says "split by every N samples" rather than "split in to random samples of size N".
Okay, so I'm not sure where that behavior came from but I'll go with the intent of the inline comment. If it says split every nth sample then I agree we should change it to do that.
Yeah, that is actually also what the documentation says indirectly through the doc for gtMethod
, which ppMethod
extends. Updated in d79f31408b06d9f2c99e171d8ce2c2df7de7180e.
Thanks.
Hi @mikejiang,
I have been trying to use the groupBy argument to split samples prior to gating. My understanding is that a numeric groupBy (n) spilts the data every n samples (i.e. if there are 6 samples and groupBy is set to 2, there should be 3 groups of 2 samples each group1 = files 1&2 group2 = files 3&4 group3 = files 5&6). This group assignment should be the same with every run of the gating pipeline, but this does not seem to be the case - the groups are changing each time. See trouble shooting below:
Grouping using pData column names e.g. "Treatment" performs as expected with every run.
I think removing sample() from this line will fix the problem: https://github.com/RGLab/openCyto/blob/26b062b09db7a68544cd29bd1799a06e8773435f/R/preprocessing-method.R#L67
Dillon