Closed antortjim closed 6 years ago
Hi Antoio,,
At the moment we do not implement any Ocam's razor heuristic to get unique protein Id for the shared peptide. I am aware that there are several way to do that, and of course this must be a user choice. The idea behinf moFF, is just puzzle piece that could fit in custom proteomics pipeline, it is not a full proteomics sw that covers all the proteomics data analysis workflow (MS2 search, quant as peptide level , quant at protein level , label-free / labeling data , etcc ).
To go from moFF peptide quantificatio to qProtein quantification , I can suggest two solutions:
use MSqRob R package, it handles data from moFF peptide summary and perform robust statistical analisys at proitein level. Moreover it has the function that you are looking.
you can load moFF peptide intensities into MSnbase objects and perform the summarization at proteins level using their methods. Here some R code that you can use to load moFF results into MSnbase.
set = readMSnSet2(path,ecol = -c(1,2), sep = '\t')
pd = data.frame(condition = ..., lab = ...) rownames(pd) = sampleNames(set)
fd = data.frame(contaminant = ...) rownames(fd) = featureNames(set)
set = MSnSet(exprs(set), fData = AnnotatedDataFrame(fd), pData = AnnotatedDataFrame(pd))
protset <- combineFeaturesset,fun="robust", groupBy = fData(set)$protein(name of protein collumn in fData)
,cv = FALSE)
Cheers
Andrea
Hi Andrea
Thanks for your answer! I am planning to join moFF to MsqRob and perform peptide based quantification indeed :+1:
The function is actually very handy, thanks! Though it still returns groups with shared ids, only they are always the same length.
I tried running the code snippet with MSnbase and I just changed:
protset <- combineFeaturesset,fun="robust", groupBy = fData(set)$protein
(name of protein collumn in fData),cv = FALSE)
with
protset <- combineFeatures(set, fun="median", groupBy = fData(set)$Protein.IDs, cv = FALSE)
because there was some parenthesis missing.
Thanks for your time.
Cheers Antonio
Hi there!
Thanks for developing moFF, I am finding it extremely useful in my project. I was however wondering if you have any recommendation on how to post process the moFF output given in the peptide summary so that no protein id appears in more than 1 protein group:
In other words, if we get the unique ids from the peptide summary file:
cut -f 2 peptide_summary_intensity.tab | tail -n +2 | sort | uniq -c
each protein id appears in only one group.The solution produced by MaxQuant (proteinGroups.txt) also exhibits this property, so I think it would make sense for moFF to give the possibility to further process the protein groups, using a user specified criteria.
I understand there are several ways to do this, all using different interpretations of Occam's razor. For example, the following situation:
could be solved by:
How would you go with this? Is there any implementation available you could lead us to? Thanks beforehand!!
Best regards Antonio