How to ensure non-marker parameters are not used for clustering analyses (prepData(FACS=F))?

denvercal1234GitHub commented 1 year ago

Hi there,

I exported channel values from FlowJo as input into R and did the following. I want to cluster using "backboneMarkers_clustering" vector of markers, but visualise for another vector of vectors "visualising_markers".

Would you mind advising me whether the codes are correct? I want to ensure only "backboneMarkers_clustering" vector was used for clustering, and "visualising_markers" vector was used for visualising in plotExprHeatmap. I do not want the "FCS_file_name", "matchingFCStarget", etc. (which are in my rownames(F37_channelCD3CD8_fcs_data_sce) to be used in clustering.

Thank you for your help!

In my prepData, I set FACS = T as below.

F37_channelCD3CD8_fcs_data_sce <- prepData(x=F37_channelCD3CD8_fcs_data, transformation = F, truncate_max_range=F, FACS=T, panel=F37_channelCD3CD8_fcs_data_panel, md=filtered_meta_F37_CD4CD8DP_Untransformed_TRIAL_fcs_data_RemovedMargin_sample.info, 
    panel_cols = list(channel = "F37_channelCD3CD8_fcs_data_fcs_colname", antigen = "F37_channelCD3CD8_fcs_data_antigen", factors =  c("F37_channelCD3CD8_fcs_data_marker_class")),
    md_cols = list(file = "channelFCS_file_name", id = "file_name", 
        factors = c("cell_count_prePeacoQC", "cell_count_postPeacoQC", "manualInspect_strangeFCS", "preFilter_liveCD3CD8_count", "flowJo_prePeacoQC_gMFI_PE", "flowCore_postPeacoQC_medMFI_PE")))

So my row names contain markers I want to use for clustering, e.g., "TIM4.XGBoost", "CD134", but there are others I do not want to be used for anything especially clustering or differential discovery, e.g., "FlowSOM_cluster_backbone12" but want to keep them there for visualisation later.

> F37_channelCD3CD8_fcs_data_sce
class: SingleCellExperiment 
dim: 270 4955453 
metadata(2): experiment_info chs_by_fcs
assays(2): counts exprs
rownames(270): FSC.A FSC.H ... FlowSOM_cluster_backbone24
  FlowSOM_metacluster_backbone24
rowData names(3): channel_name marker_name marker_class
colnames: NULL
colData names(7): sample_id cell_count_prePeacoQC ... flowJo_prePeacoQC_gMFI_PE
  flowCore_postPeacoQC_medMFI_PE
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):

rownames(F37_channelCD3CD8_fcs_data_sce)
[237] "TIM4.XGBoost"                    "CD134"                         
[239] "TOX1thr4.XGBoost"                "UMAP1"                          
[241] "UMAP2"                           "XCR1.XGBoost"                   
[243] "Time"                            "FileName"                       
[245] "FileNo"                          "target"                         
[247] "model"                           "target_model"                   
[249] "cell_count_prePeacoQC"           "cell_count_postPeacoQC"         
[251] "preFilter_liveCD3CD8_count"      "flowJo_prePeacoQC_gMFI_PE"      
[253] "flowCore_postPeacoQC_medMFI_PE"  "manualInspect_strangeFCS"       
[255] "FCS_file_name"                   "matchingFCStarget"              
[257] "FlowSOM_cluster_backbone12"      "FlowSOM_metacluster_backbone12" 
[259] "FlowSOM_cluster_backbone16"      "FlowSOM_metacluster_backbone16" 
[261] "FlowSOM_cluster_backbone20"      "FlowSOM_metacluster_backbone20" 
[263] "FlowSOM_cluster_backbone18"      "FlowSOM_metacluster_backbone18"

Then, because I still have issued with marker_class as mentioned in #347 , I decided to specify features with a vector of markers I want to use for clustering in cluster.

F37_channelCD3CD8_fcs_data_sce <- CATALYST::cluster(F37_channelCD3CD8_fcs_data_sce, features = backboneMarkers_clustering, 
    xdim = 10, ydim = 10, maxK = 30, 
    verbose = TRUE, seed = 12345)

#### After clustering 
> show(F37_channelCD3CD8_fcs_data_sce)
class: SingleCellExperiment 
dim: 270 4955453 
metadata(5): experiment_info chs_by_fcs cluster_codes SOM_codes delta_area
assays(2): counts exprs
rownames(270): FSC.A FSC.H ... FlowSOM_cluster_backbone24
  FlowSOM_metacluster_backbone24
rowData names(4): channel_name marker_name marker_class used_for_clustering
colnames: NULL
colData names(8): sample_id cell_count_prePeacoQC ...
  flowCore_postPeacoQC_medMFI_PE cluster_id
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):

Then I want to plot a heatmap of the vector markers "visualising_markers".

CATALYST::plotExprHeatmap(F37_channelCD3CD8_fcs_data_sce, features = visualising_markers, scale = "last", q = 0, bars = T, k="meta25", by="cluster_id", perc=F, bin_anno = F)

HelenaLC commented 1 year ago

Yes, looks all good to me. Completely ignoring marker_classes always work if you just pass the features to use (e.g., for dimensionality reduction, clustering) / visualize to the respective function. You can also double check this by looking at rowData(sce); there should be logical column indicating which features have been used for clustering. And, of course, in visualizations you should see that only the specified features appear. As long as that's all good, I don't see any issues and you're safe.

denvercal1234GitHub commented 1 year ago

Thank you very much @HelenaLC for your prompt response. I did as described above (clustering using "backboneMarkers_clustering), but this time for the CATALYST::plotExprHeatmap I set features = NULL to just visualise all "markers" which include some non-marker parameters, e.g., FCS_file_name (which is a string), but the colouring in the heatmap still show gradation. You know why this is?

Should I not include these non-marker parameters in the visualisation? I thought I can cluster with a set of markers, then visualise whatever markers from the resulting clusters (they do not have to be just the markers used for clustering).

HelenaLC / CATALYST

How to ensure non-marker parameters are not used for clustering analyses (prepData(FACS=F))? #349