ftwkoopmans / msdap

MS-DAP: downstream analysis pipeline for quantitative proteomics
GNU General Public License v3.0
31 stars 7 forks source link

mixed effect model #28

Closed brookegenovese closed 5 months ago

brookegenovese commented 5 months ago

Hi there - thanks for your work on this package! I've found it incredibly useful on a few proteomics projects we're working on.

I have a dataset that has both an interaction effect and a random variable that I'd like to perform DEA on (via deqms) within the ms-dap workflow. I found your response to a similar question from a couple years ago which was helpful for my initial hard coding efforts. Seeing as limma isn't suited to handle mixed effect models, do you have a recommended bypass that allows users to integrate both fixed and random effects? I've tried using the dream extension from VariancePartition to handle the mixed effects, but can't ultimately get results that cooperate with deqms/ms-dap in a downstream workflow.

Here's the formula I'm using, if helpful for context:

form <- ~group*day + experiment + (1|animal)

Thanks again!

ftwkoopmans commented 5 months ago

The typical Limma + DEqMS workflow that is integrated in MS-DAP cannot accommodate this.

If modeling the mixed effect is essential for your study/dataset, perhaps you can just run MS-DAP without performing DEA and then apply any statistical tool to the output protein data matrix. So use msdap for the data processing/filtering/normalization/QC, then do your own thing regarding statistical analyses.

I would suggest to still define your sample groups according to experimental conditions (in MS-DAP sample metadata input table) and set 'filter_by_contrast=FALSE' (i.e. use global filtering). If your downstream statistical analyses do not downweigh/punish proteins that were quantified with only 1 peptide, I suggest to also add an additional filter to remove proteins with less than 2 peptides ('filter_min_peptide_per_prot = 2'). You can then find the filtered and normalized protein matrix in a file in the MS-DAP output folder.

brookegenovese commented 4 months ago

Thanks for those insights! Following up here with what I ended up trying, for those that are in similar situations. I'm just starting to explore this data but I'm not seeing any issues with this workflow so far...data is from Spectronaut and is a DIA LCMS proteomics set. I have 4 treatment groups and my samples come from animals that were longitudinally sampled.

As you suggested, I first ran MS-DAP without performing DEA. dataset = filter_dataset(dataset, filter_min_detect = 3, filter_min_quant = 3, filter_min_peptide_per_prot = 1, norm_algorithm = c("vwmb","modebetween_protein"), by_group = F, by_contrast = F, all_group = T)

I used the output file from ms-dap called protein_abundance__global data filter.tsv as my proteomics data input into limma / dream (an extension of limma, accessed through variancePartition, for repeated measures and mixed effect models). I also used the samples.xlsx output from this as my metadata, since it is the same structure as the ms-dap metadata file.

file.use <- "protein_abundance__global data filter.tsv pdata.file.use <- "samples.xlsx

Some adjustments: rownames(pdata) <- pdata$sample_id pdata$day <- factor(pdata$day, levels = c("day0", "day3", "day7", "day21", "day28"))

To use DEqMS, need PSM counts. So I used another ms-dap output file here to obtain this: pep.file.use <- "peptide_abundance__global data filter.tsv dat1 <- read.delim(pep.file.use, check.names = FALSE)

I followed your code in the previous issue to obtain peptides_for_contrast and prot_pep_count.

Preparation for dream: param <- SnowParam(4, "SOCK", progressbar = TRUE) form <- ~group*day + experiment + (1|animal)

I then had to do a bunch of tedious contrast matrix building. I used emmeans to obtain contrast vectors and had to build a dummy mod to do so (contrast.matrix), then added in main effect contrasts (contrast.matrix2). Important for my study design: need to only use comparisons with controls.

contrasts <- c(contrasts, contrasts2) # combine descriptive contrast labels contrast.matrix <- cbind(contrast.matrix, contrast.matrix2) # combine contrast matrices

Fit the data with dream/limma fit1 <- dream(dat, form, pdata, L = contrast.matrix) fitmm <- eBayes(fit1)

Added the counts needed for DEqMS fitmm$count = prot_pep_count$peptides_used_for_dea[match(rownames(dat), prot_pep_count$protein_id)] min(fitmm$count) # should return [1] 1

Apply DEqMS on results from dream: fit = suppressWarnings(DEqMS::spectraCounteBayes(fitmm))

Extract stats of interest using whatever extraction method you need (topTable, outputResult, etc)