HelenaLC / CATALYST

Cytometry dATa anALYsis Tools
67 stars 30 forks source link

prepData on GatingSet #327

Closed FMartina87 closed 1 year ago

FMartina87 commented 1 year ago

Hello! First of all thank yo for this package: it is a complete and a very nice tool for cytof analysis. I have a question regarding the function prepData: I have a gatingset object (created from a .wsp file): would it be possible to apply the function to different subpopulations already gated doing something like:

CD4 <- gs_pop_get_data(gs_object, "CD4+")

sce_CD4 <-prepData(x = CD4, md = md, panel = panel, transform = TRUE)

CD8 <- gs_pop_get_data(gs_object, "CD8+")

sce_CD8 <-prepData(x = CD8, md = md, panel = panel, transform = TRUE)

HelenaLC commented 1 year ago

In principle, I think yes, this should be doable. prepData expects a flowSet or list of flowFrames as input. So if you can convert the GatingSet to that (which either flowCore or flowWorkspae probably support), that might do the trick. It might be simpler (and possibly also work better) to put the different populations (here, CD4 and CD8 etc.) in a single object (i.e., flowSet) with a corresponding metadata (md) table to annotate them. Also, I think you'd want transfrom = FALSE because gating is already performed on expression-like values (although I am not 100% sure what gs_pop_get_data will pull out, but I assume it's already transformed data)

HelenaLC commented 1 year ago

Closing as this seems to be resolved. Feel free to reopen/continue the discussion if not!

FMartina87 commented 1 year ago

Hi! Sorry for not replying sooner and thank you for your comment. I indeed tried the solution you proposed (I created a cytoset/flowset taking the data of the populations of interest) and It worked -meaning I can run the prepData function).

i have a couple of issues left:

-I was trying to understand though what you meant by "it might be simpler (and possibly also work better) to put the different populations (here, CD4 and CD8 etc.) in a single object (i.e., flowSet) with a corresponding metadata (md) table to annotate them". I thought the md dataframe was to annotate the samples, not the populations... Or is there something I did not understand correctly? Thank you again for your help!

HelenaLC commented 1 year ago
  1. The input data are assumed to be counts, so if you are calling prepData with transform = FALSE, the only available assay will be called "counts". Now, if this is already transformed data, you should simply rename the assay via assayNames(sce) <- "exprs" as "exprs" is, by default, expected by most visualizations.
  2. The md table is generally used to annotate cells. In the original workflow where each FCS corresponds to one sample, metadata are replicated across cells of the corresponding sample. In your case, you can simply annotate whatever you like via specifying colData entries in the SCE, after its construction. Alternatively, you can provide a md table to annotate the subpopulations. Briefly: It doesn't matter if the flowFrames are samples, patients, batches, subpopulations etc. - they are simply treated as containing cells with the same metadata when calling prepData.