MSGFPlus / msgfplus

MS-GF+ (aka MSGF+ or MSGFPlus) performs peptide identification by scoring MS/MS spectra against peptides derived from a protein sequence database.
Other
76 stars 36 forks source link

Combining identification results of fractions #130

Closed rkimoakbioinformatics closed 2 years ago

rkimoakbioinformatics commented 3 years ago

Describe the question or problem

Each dataset of a CPTAC study has multiple mzML files for multiple "fractions". MS-GF+ runs with each of those mzML files. How should I combine MS-GF+ results to get the peptide identification result for the whole dataset?

Details

For example, https://cptac-data-portal.georgetown.edu/study-summary/S038 01CPTAC_OVprospective_Proteome_JHU_20161209 dataset has 24 mzML files from 24 fractions. 24 MS-GF+ runs with each of the 24 mzML files produce 24 MS-GF+ result files. I don't know how to go from here to make a peptide identification result file at a dataset level.

Useful extras

Thanks for all the help you have been giving.

alchemistmatt commented 3 years ago

Note: the MS-GF+ results are in .mzid files. In contrast, the .mzML files are XML forms of the original mass spectra.

I can think of two options for combining the MS-GF+ results:

Option 1: Combine the .mzid files using the MzIdMerger. Download from the Releases Page

Option 2: Convert each .mzid file to a .tsv file, optionally filtering while you convert each one.

Use Fill Down to easily populate a series of rows with that formula.

If you go with Option 1, you could still use the MzidToTsvConverter to convert to tab-delimited text.

rkimoakbioinformatics commented 2 years ago

@alchemistmatt Thanks! Regarding another question I posted, I figured out.