Closed rkimoakbioinformatics closed 3 years ago
Thank you for clearly explaining where you obtained your data files and how you analyzed them. Please clarify what you mean by the .mzid files from the CPTAC website having "sample-level peptide identification data" that is not in the .mzid files created by MS-GF+.
Do you mean the TMT reporter ion abundances? For example, this in the .mzid file:
<userParam name="CPTAC-CDAP:TMT10-126" value="36232.4/-0.20"/>
<userParam name="CPTAC-CDAP:TMT10-127N" value="30440/-0.42"/>
<userParam name="CPTAC-CDAP:TMT10-127C" value="29381.2/-0.42"/>
<userParam name="CPTAC-CDAP:TMT10-128N" value="7019.51/-0.40"/>
<userParam name="CPTAC-CDAP:TMT10-128C" value="24031.9/-0.22"/>
<userParam name="CPTAC-CDAP:TMT10-129N" value="13974/-0.17"/>
<userParam name="CPTAC-CDAP:TMT10-129C" value="8465.62/-0.44"/>
<userParam name="CPTAC-CDAP:TMT10-130N" value="37748.8/-0.42"/>
<userParam name="CPTAC-CDAP:TMT10-130C" value="33107.2/-0.45"/>
<userParam name="CPTAC-CDAP:TMT10-131" value="43514.9/-0.22"/>
<userParam name="CPTAC-CDAP:TMT10-FractionOfTotalAb" value="0.0403006"/>
<userParam name="CPTAC-CDAP:TMT10-TotalAb" value="263916"/>
and the equivalent info in the TSV file (which has extension .cap.psm) | TMT10-126 | TMT10-127N | ... | TMT10-131 | TMTFlags | TMT10-TotalAb | TMT10-FractionOfTotalAb |
---|---|---|---|---|---|---|---|
11727.1/-0.20 | 6852.88/-0.16 | ... | 5462.02/-0.32 | I | 51338.6 | 0.0311579 | |
1156.89/-0.30 | 1504.69/-0.32 | ... | 2789.99/-0.33 | MI | 13834.8 | 0.0955993 | |
17484.2/-0.22 | 12126.1/-0.32 | ... | 11460.1/-0.25 | I | 92237.6 | 0.139702 |
If this is what you're referring to, that information is not something that MS-GF+ extracts. MS-GF+ only analyzes the MS/MS spectra to identify peptides. The Common Data Analysis Pipeline (CDAP) uses MS-GF+, along with other tools to extract all of this information, then package it into .mzid files and .cap.psm files.
The equivalent open-source tool that we have for extracting reporter ion information is MASIC:
The program we have for merging the MS-GF+ results with MASIC results is the MASIC Results Merger:
Thanks for your reply. Yes, I meant the TMT reporter ion abundances. I'll check out MASIC.
@alchemistmatt oh and thank you so much for detailed explanation! It helped a lot.
Describe the question or problem
How to get sample-level peptide identification data?
Details
I am following Common Data Analysis Pipeline (CDAP) at https://cptac-data-portal.georgetown.edu/cptac/documents/CDAP_description_20140225.pdf to reproduce CPTAC datasets. I am using https://cptac-data-portal.georgetown.edu/study-summary/S038 as a test dataset. I downloaded an mzML file 01CPTAC_OVprospective_W_JHUZ_20161209_QE_f01.mzML and ran
java -Xmx3500M -jar ./MSGFPlus.jar -d ./proteins.fasta -t 20ppm -e 1 -m 3 -inst 3 -protocol 4 -ntt 1 -tda 1 -ti 0,1 -n 1 -maxLength 50 -mod $modpath -s 01CPTAC_OVprospective_W_JHUZ_20161209_QE_f01.mzML.mzML -o 01CPTAC_OVprospective_W_JHUZ_20161209_QE_f01.mzid
It produced the mzid file. I ran
mono ./MzidToTsvConverter/MzidToTsvConverter.exe 01CPTAC_OVprospective_W_JHUZ_20161209_QE_f01.mzid
and it produced 01CPTAC_OVprospective_W_JHUZ_20161209_QE_f01.tsv file.
The mzid and tsv files both do not have sample-level peptide identification data. However, the mzid and psm files downloaded from https://cptac-data-portal.georgetown.edu/study-summary/S038 do contain sample-level data under SpectrumIdentificationItem.
I'm new to MS-GF+. Can anyone help with this?
Useful extras
java -Xmx3500M -jar ./MSGFPlus.jar -d ./proteins.fasta -t 20ppm -e 1 -m 3 -inst 3 -protocol 4 -ntt 1 -tda 1 -ti 0,1 -n 1 -maxLength 50 -mod $modpath -s 01CPTAC_OVprospective_W_JHUZ_20161209_QE_f01.mzML.mzML -o 01CPTAC_OVprospective_W_JHUZ_20161209_QE_f01.mzid