Closed DillonHammill closed 4 years ago
GUID
isn't a standard keyword from FCS standard and considered as the legacy information from flowCore/flowFrame. It has never been formally used in cytoverse tool chains and not related to sampleNames
of cytoset
in any way.
(In fact, I lean toward dropping/deprecating it entirely If it is causing confusions)
In what context do you need to use it in your workflow?
I often have cases where the sampleNames of the cytoset require changing, it would be great if this information was passed down to the flowFrame/cytoframe in some way (i.e. sampleNames(cs)[1] == identifier(cs[[1]]). A lot of the looping operations are performed at the flowFrame level (since cs[[1]] is currently a flowFrame), this means that I use the cytoset constructor a fair bit to wrap the output. The only thing is that the flowFrame identifiers of the constructed cytoset seem to be changed to the temp file name and I cannot find a way to retain the original identifiers. I was hoping a simple sampleNames replacement would work but I guess not. Please let me know if there is a way to do this.
Here is an example @mikejiang:
library(CytoExploreRData)
library(flowCore)
library(flowWorkspace)
# FLOWSET
fs <- Activation
sampleNames(fs)
fr_list <- lapply(seq_along(fs), function(x){
# DO SOMETHING HERE
return(fs[[x]])
})
names(fr_list) <- sampleNames(fs)
fs_new <- flowSet(fr_list)
sampleNames(fs_new)
identifier(fs_new[[1]])
# CYTOSET
cs <- flowSet_to_cytoset(fs)
sampleNames(cs)
fr_list <- lapply(seq_along(cs), function(x){
# DO SOMETHING HERE
return(cs[[x]])
})
names(fr_list) <- sampleNames(cs)
# cytoset constructor requires cytoframes
cf_list <- lapply(fr_list, flowFrame_to_cytoframe)
cs_new <- cytoset(cf_list)
sampleNames(cs_new)
identifier(cs_new[[1]])
# fix the identifiers
lapply(seq_along(cs_new), function(z){
identifier(cs_new[[z]]) <- sampleNames(cs)[z]
})
identifier(cs_new[[1]])
I would like to retain the behaviour of the old flowSet method in which the original identifiers are retained. It seems like a lot more work to achieve the same result with cytosets. Perhaps I can add a custom apply function which can handle the loops and fix the identifiers - otherwise this will be a lot work to track them all down in CytoExploreR. That way if this behaviour does change in the future I only need to change it in one place.
A combination of fsApply()
and flowSet_to_cytoset()
seems to work:
# FUN is a function to apply to each flowFrame that returns a flowFrame
cs_new <- flowSet_to_cytoset(fsApply(cs, FUN))
sampleNames(cs_new)
identifier(cs_new[[1]])
This could be wrapped into a new csApply()
function:
csApply <- function(x,
FUN,
...){
res <- fsApply(x, FUN = FUN, ...)
if(is(res, "flowSet")){
if(is(x, "cytoset")){
return(flowSet_to_cytoset(res))
}else{
return(res)
}
}else{
return(res)
}
}
I admittedly don't use fsApply()
much in CytoExploreR, I generally prefer to use lapply()
but I think it is time that I made the change. It does make the code more concise and readable. Thanks for steering me in the right direction @mikejiang. I will implement this under cyto_apply()
name for now in case you do decide to add a csApply()
function.
sampleNames
should be only stored at the container level(cytoset
), I think to keep the duplicated information at the individual element level (cytoframe
) is a bad design that can easily lead to potential data discrepancy errors.
The way that flowCore
maintains the sample name through GUID
keyword of flowFrame
a legacy behavior and should be deprecated from my opinion.
Sorry I'm coming in late to this, but @DillonHammill , is there a reason you can't just explicitly ask for a cytoframe
back in the body of your lapply
call?:
Using your example:
cf_list <- lapply(seq_along(cs), function(x){
# DO SOMETHING HERE
return(cs[[x, returnType = "cytoframe"]])
})
names(cf_list) <- sampleNames(cs)
cs_new <- cytoset(cf_list)
sampleNames(cs_new)
[1] "Activation_1.fcs" "Activation_2.fcs" "Activation_3.fcs" "Activation_4.fcs" "Activation_5.fcs" "Activation_6.fcs" "Activation_7.fcs" "Activation_8.fcs"
[9] "Activation_9.fcs" "Activation_10.fcs" "Activation_11.fcs" "Activation_12.fcs" "Activation_13.fcs" "Activation_14.fcs" "Activation_15.fcs" "Activation_16.fcs"
[17] "Activation_17.fcs" "Activation_18.fcs" "Activation_19.fcs" "Activation_20.fcs" "Activation_21.fcs" "Activation_22.fcs" "Activation_23.fcs" "Activation_24.fcs"
[25] "Activation_25.fcs" "Activation_26.fcs" "Activation_27.fcs" "Activation_28.fcs" "Activation_29.fcs" "Activation_30.fcs" "Activation_31.fcs" "Activation_32.fcs"
[33] "Activation_33.fcs"
> identifier(cs_new[[1]])
[1] "Activation_1.fcs"
Basically, to respond to your point "since cs[[1]] is currently a flowFrame", it doesn't have to be. That's just default.
Thanks @jacobpwagner I assumed that there was a reason why a cytoframe is not returned by default?
I frequently use the flowFrame identifiers in CytoExploreR, particularly within cyto_plot()
which does almost everything at the flowFrame level. For example, identifiers are used in titles and legends. If this information does not align with the original names users can't tell which sample is which.
The only way that users know which sample is which is based on the inital file name, so think it is important that this retained in some way.
Yeah, I know it's sort of confusing. I tried to call it out in the cytoset
doc methods section, but there's a lot there. The only reason a cytoframe
is not returned from cytoset[[]]
is for backwards compatibility with user scripts that expect to get a flowFrame
back. In particular, it's a problem if they assume they can modify it without changes cascading to the original. If the reference nature of cytoframe
is a problem, you can always force a copy of the cytoframe
returned (which is effectively what was happening with flowSet[[]]
but for a flowFrame
).
OK thanks that makes sense.
Also, $GUID
isn't standard, but $FIL
is standard (albeit technically optional). That's probably a better way to tie the samples to their initial file names if you want to grab a keyword.
Yeah I thought I might do that instead as identifier preferentially checks the GUID slot but at the moment both slots contain temp file names.
@jacobpwagner do you think it is better to loop as you did instead of using the fsApply approach? I want to settle on a standard way of doing this so I can wrap it up nicely and only have to modify it in a single location.
Hey @DillonHammill . Sorry for the delay. Yeah, the lapply
approach is better. Just make sure you are keeping in mind the fact that the cytoframe
s are references and you should be fine. Also, after some offline discussion, we will probably be switching the [[
subsetting behavior for cytoset
to returning a cytoframe
by default in the future.
Thanks @jacobpwagner, I thought this approach would be better in the long run. I will add this to CytoExploreR and add a copy option so that I can control when the underlying data is modified. Please let me know if you do decide to drop the current GUID slot behaviour so that I can make the necessary changes.
Closing this issue as it is no longer required as described in #344.
Hi @mikejiang & @jacobpwagner,
Looks like replacement of sampleNames at the cytoset level is not inherited correctly at the flowFrame/cytoframe level. The
sampleNames
replacement is being set as the rownames in theGUID
slot, it doesn't actually replace theGUID
slot entries. See example below:Apologies if you have already addressed this, I am still having problems with Rtools 4.0 so I am testing against the development versions of BioC packages.