Teichlab / cellphonedb

MIT License
342 stars 105 forks source link

Precision on input format #145

Closed kaizen89 closed 4 years ago

kaizen89 commented 4 years ago

Hi, I am wandering whether the counts data should be in log or linear scale. Thank you

ciawojp commented 4 years ago

Hi, I'm gonna put my question here because it's related.

I think non-log-transformed count data is recommended. In my case, I want to use the count data which I regress out cell-cycle effect. But in the process of cell-cycle regression, log-transformation is required.

Is there any critical reasons why you recommend non-log-transformed data? Otherwise, can I use log-transformed data?

Thank you.

mief commented 4 years ago

Hi, We recommend non-log transformed data, however we have also tested the method with log-transformed data and the results are similar. Regarding the regressing out covariates, I wouldn't recommend using corrected data. In Seurat for example, when you regress out a variable, the scaled residuals are saved in the scale data which is used only for PCA, UMAP and clustering, but the DE analysis is done on the log-transformed data. Best, Mirjana

Manikgarg commented 3 years ago

Hi @mief ,

How do you recommend correcting for covariates then? I am currently running CellPhoneDB separately on 3 non-overlapping subsets of data corresponding to 3 different conditions and then comparing the results to infer differences in underlying immune cell responses. I later observed that atleast 2 clinical covariates are significantly associated (p-value<0.005) with these 3 different conditions. How would I be able to tell if the difference(s) in immune cell interactions between these 3 conditions is not due to differences in these 2 clinical covariates? Any suggestions appreciated! (Happy to open this as a separate question :))

Many thanks in advance,

Best, Manik