ramziabb commented 7 months ago

Hello,

I would like to compute the number of cells of each subtypes in a blood sample. This excellent workflow computes differential cell population abundance. Prior to performing the differential analysis (section 11.1) I would like to convert proportions to a raw number by multiplying it by the total number of circulating cells. I have the total number of circulating cells for each sample.

Is it possible to multiple the proportions by the total number of cells to convert it to a raw number, and then perform the differential analysis on this?

I envision this as copying the sce object after cluster merging, then for each cluster in each sample, multiplying the proportion by that patient's total cell count and then continuing the differential analysis unchanged.

Would this work? If so, would someone please provide sample code?

Thank you, Ramzi

SamGG commented 7 months ago

Hi, The main part of the CATALYST pipeline aims at importing, clustering and reducing dimensions. The differential analysis is addressed by diffcyt that requires counts, as stated in section. diffcyt is itself based on edgeR, a package for analyzing RNAseq.

CATALYST/diffcyt/edgeR analyze cell count per cluster relatively to a total cell count per sample. But the proportion is never computed in order to perform the statistical analysis. The main value provided to diffcyt/edgeR is count per cluster for each sample.

If I am correct, edgeR compares counts taking a normalization coefficient between samples (aka libraries). This normalization is achieved using the TMM method, but it is possible to provide a vector of normalization coefficients, one coefficient per sample. diffcyt allows passing arguments to edgeR computation to perform this normalization. This normalization changes the total cell count per sample in a relative fashion, i.e. between samples. As such, the diffcyt function help (part below) says that the product of these coefficients should be 1. Maybe the product could be higher than 1, allowing to relate observed counts to the total number of circulating cells per sample instead of the total number of cells acquired by the cytometer.

If I am still correct, in the diffcyt() call of the section 7, try to adds arguments normalize=TRUE and norm_factors=norm_vector. The actual un-normalized number of cells of a sample considered by diffcyt/edgeR is the sum of cells in all the clusters of this sample. So norm_vector might be equal to total_number_of_circulating_cells divided by total_number_of_cells_in_fcs.

If it works, it's great. The alternative way to do implies more programming as it requires to add a fake cluster accounting for the cells of the sample that are not in the FCS file, i.e. total_number_of_circulating_cells - total_number_of_cells_in_fcs.

Maybe there are smarter solutions that will be proposed by Helena or Mark.

Best.

Below the doc part of diffcyt.

' @param normalize Whether to include optional normalization factors to adjust for

' composition effects. Default = FALSE.

'

' @param norm_factors Normalization factors to use, if normalize = TRUE. Default =

' "TMM", in which case normalization factors are calculated automatically using

' the 'trimmed mean of M-values' (TMM) method from the \code{edgeR} package.

' Alternatively, a vector of values can be provided (the values should multiply to 1).

ramziabb commented 7 months ago

Thanks Sam. I will try what you suggest. Does anyone else have any ideas?

HelenaLC commented 1 month ago

closing due to inactivity - feel free to reopen / continue discussion here! - though I believe @SamGG's response was quite complete; thanks!

HelenaLC / CATALYST

Convert Cell Proportions To Values #389