HelenaLC / CATALYST

Cytometry dATa anALYsis Tools
67 stars 30 forks source link

Recluster with new panel #335

Closed shanh1 closed 1 year ago

shanh1 commented 1 year ago

Hi there,

I have created an SCE object using a 6 marker panel to identify the major cell populations.

I would now like to use a panel of around 30 markers to subset each of the cell populations and view them based on these new markers.

I am not sure how to do this.

How can I add the new panel to the SCE object?

I already know which clusters represent which cell populations (e.g. clusters 2,6,7,10 etc. are CD4+ and I subset them using filterSCE(). But then how can I look at the expression of these new 30 markers in this subset if they are not in the original panel table, but a separate table?

Would I have to restart my SCE object by adding every single marker to one panel table(36 total for example), clustering based on the 6 for general populations and then clustering based on the 30 for each subset? Not sure how to do this either.

I am also using this in combination with CytofRUV.

Kind regards,

Shan

HelenaLC commented 1 year ago

There’s two ways. One you pointed out: creating the SCE with all markers and specifying which to use for clustering (there’s an argument in cluster() that let’s you specify this, as well as in runDR() for dimension reduction), which is the easiest, most robust way, and how the workflow is intended to be carrier out. If for whatever reason this is not an option, you could construct an new SCE of the additional markers and row bind the assays and rowData - but need to be absolutely sure cell order and potential filtering is in synch when doing that, or you could accidentally mix up the cells. Hence, again, the analysis is meant to run on all data from the start to avoid mistakes like that. If you anyways use the same markers for clustering, though, reconstructing and rerunning the analysis with the full panel would not change your current assignments.

shanh1 commented 1 year ago

Thank you for responding so quickly! I greatly appreciate this.

I might rerun the clustering with all the markers in one table. How would I select only a subset of markers to identify the general population? How would I adjust the code?

And then once I do this, I can filter the population and recluster based on the other markers of interest is that correct?

HelenaLC commented 1 year ago

There’s two ways. Either pass the markers you want to the function (argument ‘features’), or set the marker_class to “type”, which will be used by default and is perhaps rhe safest way if you’re already decided. And yes, correct, you can always inspect all markers, subset, and recluster however you like. (Sure thing, happy to help ✌️)

shanh1 commented 1 year ago

Okay I see.

1) And if I set the general markers to "type", but then I want to carry on one of these markers in the next round of clustering with my other markers (which will be "state"), how can I change this in the SCE?

2) Sorry to be annoying but how would I do it the alternative way where I bind the new markers to the SCE instead?

I'm not too great at R so I apologise!

HelenaLC commented 1 year ago
  1. The framework is quite flexible, so there are different options. Here's some exemplary pseudocode for option i) specifying marker classes and ii) specifying markers on the fly... While option i) seems "longer", it assures you are fixing the markers for clusters/dimension reduction within the object; this also affects the way some visualizations are generated and how differential testing is performed. So it might be the safer approach overall.
    
    # initial panel defines type/state markers
    # for broad subpopulation clustering
    sce <- prepData(...) 
    # these both default to using "type"
    sce <- cluster(sce, ...) 
    sce <- runDR(sce, ...)
    # subset subpopulation(s) of interest
    sub <- filterSCE(sce, k = "?", cluster_id %in% c("B cells", ...))
    # redefine marker clusters & rerun
    marker_classes(sub) <- c("type", "state", ...)
    sub <- cluster(sub ...) 
    sub <- runDR(sub, ...)

OR just pass the markers to use

to each function call...

set1 <- c("CD1", "CD2", "CD3") # general markers set2 <- c("CD4", "CD5", "CD6") # specific markers sce <- cluster(sce, features = set1) ... sub <- cluster(sub, features = set2)

...also need to pass 'features' to every

single visualization and differential testing

in order to control what's shown/tested



2. Since you're "not too great at R", I would kindly recommend against this. Row binding SCEs isn't something one would usually do, and the code to do it is quite messy/hacky. Also, it makes your analysis hard to comprehend and messy. Also, as mentioned above, if there's a mistake there you could mess up your data completely so... the beauty of the SCE class is that everything is kept in synch and all functions run on it, making it robust and easy to handle (especially for R noobies). Overall, let's try getting option 1. going...
shanh1 commented 1 year ago

Thank you so much! I can't tell you how much I appreciate your help. I will try and run this today :)