HelenaLC / CATALYST

Cytometry dATa anALYsis Tools
66 stars 30 forks source link

`CATALYST::plotPbExprs` produces none of the cells at 0 expression and reducedDims.UMAP have cells squeezed at 0? #358

Closed denvercal1234GitHub closed 1 year ago

denvercal1234GitHub commented 1 year ago

Hi there,

Thank you again for the package with very useful plotting functions.

I was a bit puzzled by looking at the plotPbExprs because across all markers, none has any cells that were at 0 expression, and in the reducedDims.UMAP plots, they are all squeezed down at 0 (below).

Would you mind giving me some pointers as to why it might be the case?

I did check ?CATALYST::plotPbExprs, it does not seem the function applies any additional transformation to the transformed data.

Thank you for your help!

Screenshot 2023-07-22 at 14 05 50

This is how I process my data before clustering (#356):

##I transformed my FACS data in FlowJo with bi-exponential transformation and exported the data as channel values (.csv) so the transformation was preserved.
##In order to get the FCS files but with transformed data, I then read in the channel values (.csv) into R using Spectre::read.files, then exported the files out as FCS using Spectre::write.files. These exported FCS files are now transformed.
F37_channel_data.list <- Spectre::read.files(file.loc = "/Users/stillhere/Documents/.../F37_FlowJoBiExED_ChannelValues", file.type = ".csv", do.embed.file.names = TRUE)

Spectre::write.files(F37_channel_data.list, write.csv = FALSE, write.fcs = TRUE, file.prefix = "channelFCS", divide.by = "FileName") 

##I then used flowCore::read.flowSet to read in the transformed FCS files as a flowSet setting transformation = FALSE, before performing prepData.
F37_channel_fcs_data <-flowCore:: read.flowSet(path=fcs.dir1, pattern="*.fcs", transformation = FALSE, truncate_max_range = FALSE) 

##Create sce object with transform = F
F37_channel_fcs_data_sce <- prepData(x=F37_channel_fcs_data, transform = F, truncate_max_range=F, FACS=T ... )`

Some diagnostic plots after clustering and UMAP:

CATALYST::plotScatter(F37_channel_fcs_data_sce, c("CD185", "CD183"), assay = "exprs", zeros=T)

Screenshot 2023-07-22 at 14 18 31
#### My sce object 
> F37_channel_fcs_data_sce
class: SingleCellExperiment 
dim: 270 4955453 
metadata(5): experiment_info chs_by_fcs cluster_codes SOM_codes delta_area
assays(1): exprs
rownames(270): FSC.A FSC.H ... FlowSOM_cluster_backbone24 FlowSOM_metacluster_backbone24
rowData names(4): channel_name marker_name marker_class used_for_clustering
colnames: NULL
colData names(8): sample_id cell_count_prePeacoQC ... flowCore_postPeacoQC_medMFI_PE cluster_id
reducedDimNames(1): UMAP
mainExpName: NULL
altExpNames(0):
HelenaLC commented 1 year ago

plotScatter() does a faceting, fixing x and y scales to be the same across all panels. My guess is that it just appears like all cells are squeezed at 0, because UMAP coordinates are usually in the range of (-50, 50) or less, not 500.

Try using plotDR() instead for UMAPs. You are plotting CD185 expression vs. UMAP dimensions, which I have never seen. Typically one would plot UMAP dim. 1 vs. 2, and color cells by expression level.

Also, just to note that: UMAP coordinates are arbitrary, and do not translate to expression levels; e.g., 0 UMAP coords have nothing to do with cells having an expression of 0.

Unrelated: I see this is FACS data, but am a little confused by the scale. Are you expecting the transformed data to lie in the 100s (instead of, say, 0-15)?

denvercal1234GitHub commented 1 year ago

Thank you so much @HelenaLC for your response.

The plot above was just the output of the CATALYST::plotScatter(F37_channel_fcs_data_sce, c("CD185", "CD183"), assay = "exprs", zeros=T). The "reducedDims.UMAP" plots were automatically generated along with the density plot.

Below is an example of a FCS file after I transformed it in FlowJo with bi-exponential transformation. I then exported it as channel values (.csv) to then import into R (according to https://wiki.centenary.org.au/display/SPECTRE/Data+transformation in the section "Alternative approach to data transformation: CSV channel values")

Screenshot 2023-07-22 at 17 27 00

Once the channel values are imported into R, the values are indeed in the 100s and not 0-15. So, the bi-exponential transformation did scale the data into this range (even though from visually looking at the FlowJo plots, it looks like some cells should be at 0?), and the baseline therefore is not 0...

Do you think these exported channel values are compatible with FlowSOM clustering and dimension reduction without doing any additional steps I am not aware?

Screenshot 2023-07-22 at 17 37 56

Thank you again.

HelenaLC commented 1 year ago

I can't really give a good answer here; I don't know. This seems to be nothing software-related, so I'd suggest posting your question on alternative platforms, or seeking advice from a colleague or bioinformatics consultant.

SamGG commented 1 year ago

I will confirm points mentioned by Helena. Channels are usually not plotted versus reduced dimensions. I don't understand how/why you get a transformed range of 250 to 750. The difference of ranges explained the compressed UMAP dimensions. The range of transformed channels and reduced dimensions are usually different. But the range of transformed channels are typically in 0-10. I don't rely on the transformed and exported data from FlowJo: transformed data look similar, which makes sense, but the scaling is unknown to me. On the contrary, when I do the transformation by hand, I understand what I get. For example, using asinh as transformation function (taking 500 as a starting point for the cofactor), asinh(1e5/500) = 5.991471, which confirms our experience with mass and flow data. Best.