Batch correction? - Githubissues

arnesmits / DEP

DEP package

27 stars 13 forks source link

Batch correction? #11

Open patrickturko opened 5 years ago

patrickturko commented 5 years ago

I have proteomics data from three cell lines, each of which was exposed to two conditions (control and a drug). I've followed your vignette through normailzation and differential enrichment analysis, and have plotted a PCA, which shows major differences between cell lines and much smaller differences between conditions. I'm concerned that any condition-level differences are swamped by the cell-line differences, and in fact I have no significant proteins.

How do you suggest that I deal with this? In differential expression analysis using DESeq2 or limma I would simply add a model term to indicate the cell line of each sample. Can I do something similar in DEP? Or should I rather do an explicit batch correction using (eg) combat and then do a DEA on the residuals?

Thanks, Patrick Turko

adomingues commented 4 years ago

Hi @patrickturko,

I am also running into a similar issue. Did you ever get this solved? Could you please share what was your strategy in the end?

Cheers, António

twesleyb commented 4 years ago

I'm having a similar issue. From going through the source code, I don't think DEP can handle this sort of experimental design. You might trying passing a formula to DEP::test_diff like (abundance)~batch + condition, but then you will receive an error telling you that 'condition' is not the first factor in the design matrix (batch is).

Error in DEP::test_diff(norm_prot, type = "control", control = "Control",  : 
  first factor of 'design_formula' should be 'condition'
Execution halted

@arnesmits would it be possible to explore adding the ability to pass more complex models to DEP::test_diff and the underlying calls on limma::lmFit --> limma::makeContrasts --> limma::contrasts.fit --> limma::eBayes?

adomingues commented 3 years ago

This might come a little late, but the key seems to be making the design formula to always have condition (= the variable of interest) always as the first term:

diff <- test_diff(data_imp, type = "all", design_formula = formula(~ 0 + condition + stress))