The Informed Proteomics project includes algorithms for proteomic mass spectrometry data analysis. Although the back-end data access and some of the scoring routines are general purpose, this repository is currently maintained for top down MS/MS datasets.
Implementation details are described in the manuscript "Informed-Proteomics: open-source software package for top-down proteomics", published in Nature Methods
https://github.com/PNNL-Comp-Mass-Spec/Informed-Proteomics/releases
The latest versions of the Informed Proteomics tools may be available on the AppVeyor CI server, though they get auto-deleted after 6 months.
MSPathFinder finds peptides in top-down LC-MS/MS datasets. Similar to database search engines for bottom-up, it takes a FASTA file, a spectrum file, and a list of modifications as an input and reports proteoform spectrum matches (PsSMs) and their scores. These results are output in a tab-separated format and in a MzIdentML file.
Processing steps:
Run PbfGen.exe to convert the instrument file or .mzML file to an optimized binary file
Run ProMex on the .Pbf file to deisotope the data, including determine charge states
Run MSPathfinderT with the .Pbf (or .raw) file and a FASTA file to search for proteins
Optionally use ProMexAlign to align MS1 features between datasets
Example command lines:
PbfGen.exe -s MyDataset.raw
ProMex.exe -i MyDataset.pbf -minCharge 2 -maxCharge 60 -minMass 3000 -maxMass 50000 -score n -csv n -maxThreads 0
MSPathFinderT.exe -s MyDataset.pbf -feature MyDataset.ms1ft -d C:\FASTA\ProteinList.fasta -o C:\WorkDir -t 10 -f 10 -m 1 -tda 1 -minLength 21 -maxLength 300 -minCharge 2 -maxCharge 30 -minFragCharge 1 -maxFragCharge 15 -minMass 3000 -maxMass 50000 -mod MSPathFinder_Mods.txt
For viewing search results, you might want to consider LCMS-Spectator. It can also function as a GUI for running ProMex and MSPathFinder.
PbfGen, ProMex, and MSPathFinderT can be run on Linux using Mono
Example command lines:
mono PbfGen.exe -s *.raw
mono PbfGen.exe -s *.mzML
mono PbfGen.exe -s *.mzML.gz
mono ProMex.exe -i *.pbf -minCharge 2 -maxCharge 60 -minMass 2000 -maxMass 50000 -score n -csv n -maxThreads 0
mono MSPathFinder/MSPathFinderT.exe -s *.pbf -d ID_006407_8F27399B.fasta -o . -ParamFile MSPF_MetOx_CysDehydro_NTermAcet_SingleInternalCleavage.txt
PbfGen.exe
-s:RawFilePath (*.raw or directory)
[-o:OutputDir]
[-start:StartScan] [-end:EndScan]
-i
or -s
or -InputFile
-o
or -OutputDirectory
-start
-end
ProMex.exe
-i:InputFile [-o:OutputDirectory]
[-MinCharge:Value] [-MaxCharge:Value]
[-MinMass:Value] [-MaxMass:Value]
[-FeatureMap:Flag] [-Score:Flag]
[MaxThreads:Value] [-csv:Flag]
[BinResPPM:Value] [ScoreTh:Value]
-i
or -s
or -InputFile
-o
or -OutputDirectory
-MinCharge
-MaxCharge
-MinMass
-MaxMass
-FeatureMap
-FeatureMap:false
or include 'FeatureMap=False' in a parameter file-Score
-MaxThreads
-csv
-BinResPPM
or -BinningResolutionPPM
-ScoreTh
or -ScoreThreshold
-ms1ft
-ms1ft.
to infer the name from the pbf file-ParamFile
-CreateParamFile
Command to create a PNG of the features in an existing ms1ft file (requires both a .pbf file and a .ms1ft file):
ProMex.exe -i dataset.pbf -ms1ft dataset.ms1ft -featureMap
ProMexAlign.exe DatasetInfoFile
Label
RawFilePath
Ms1FtFilePath
MsPathfinderIdFilePath
Dataset_1
, Dataset_2
, etc.Example input file:
Label | RawFilePath | Ms1FtFilePath | MsPathfinderIdFilePath |
---|---|---|---|
Intact_Run8 | Intact_100ng_Run8.pbf | Intact_100ng_Run8.ms1ft | |
Intact_Run9 | Intact_100ng_Run9.pbf | Intact_100ng_Run9.ms1ft | |
Intact_Run10 | Intact_100ng_Run10.pbf | Intact_100ng_Run10.ms1ft |
MSPathFinderT.exe
-i:InputFile [-d:FastaFile]
[-o:OutputDirectory] [-overwrite:Flag]
[-mod:ModificationFilePath]
[-ic:InternalCleavageMode]
[-tda:DatabaseSearchMode]
[-tagSearch:Flag]
[-t:PrecursorTolerance]
[-f:FragmentIonTolerance]
[-MemMatches:Count]
[-n:NumMatchesPerSpec]
[-IncludeDecoys:Flag]
[-MinLength:Value] [-MaxLength:Value]
[-MinCharge:Value] [-MaxCharge:Value]
[-MinFragCharge:Value] [-MaxFragCharge:Value]
[-MinMass:Value] [-MaxMass:Value]
[-Feature:FeatureFile]
[-Threads:Value]
[-act:ActivationMethod]
[-ScansFile:MS2ScanFilterFile]
[-flip:Flag]
[-ParamFile]
[-CreateParamFile]
-i
or -s
or -specFile
-d
or -database
-o
or -outputDir
-ic
-TagSearch
-MemMatches
-n
or -NumMatchesPerSpec
-IncludeDecoy
or -IncludeDecoys
-mod
/ParamFile
or -ParamFile
-mod
switch take precedence over modifications defined in a parameter file-tda
-Overwrite
-t
or -PrecursorTol
or -PMTolerance
-f
or -FragmentTol
or -FragTolerance
-MinLength
-MaxLength
-MinCharge
-MaxCharge
-MinFragCharge
-MaxFragCharge
-MinMass
-MaxMass
-Feature
-threads
-act
or -ActivationMethod
-ScansFile
or -ScansFilePath
-Flip
-ParamFile
-CreateParamFile
Enabling tag-based searching with -tagSearch 1
can give 5% to 10% more matches, but can increase the runtime by 30% to 50%.
These suggested input file format is centroided .mzML files.
For versions released after February 1, 2019:
For versions released prior to February 1, 2019:
On Windows, several other formats are supported if an appropriate version of ProteoWizard is installed (Download here, make sure the version downloaded matches system architecture)
On Linux, supported input file formats are .raw, .mzML, and .mzML.gz
See the Example_Files directory on GitHub for sample parameter files
MSPathFinderT.exe -s C:\WorkDir\Dataset.pbf -feature C:\WorkDir\Dataset.ms1ft -d C:\WorkDir\Proteins.fasta -o C:\WorkDir /ParamFile:C:\WorkDir\MSPF_MetOx_CysDehydro_NTermAcet_SingleInternalCleavage_ReportTop2.txt
See the Example_Files directory for sample modification definition files.
Minimum required:
Minimum recommended:
Written by Sangtae Kim, Junkap Park, and Chris Wilkins for the Department of Energy (PNNL, Richland, WA)\ Copyright 2015, Battelle Memorial Institute. All Rights Reserved.\ E-mail: matthew.monroe@pnnl.gov or proteomics@pnnl.gov\ Website: https://github.com/PNNL-Comp-Mass-Spec/ or https://panomics.pnnl.gov/ or https://www.pnnl.gov/integrative-omics/
Licensed under the Apache License, Version 2.0; you may not use this program except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
RawFileReader reading tool. Copyright © 2016 by Thermo Fisher Scientific, Inc. All rights reserved.