PNNL-Comp-Mass-Spec / Informed-Proteomics

Top down / bottom up, MS/MS analysis tool for DDA and DIA mass spectrometry data
29 stars 9 forks source link
mass-spectrometry ms-datasets mzml

Informed Proteomics

The Informed Proteomics project includes algorithms for proteomic mass spectrometry data analysis. Although the back-end data access and some of the scoring routines are general purpose, this repository is currently maintained for top down MS/MS datasets.

DOI

Manuscript

Implementation details are described in the manuscript "Informed-Proteomics: open-source software package for top-down proteomics", published in Nature Methods

Install/Tutorials

See the Informed Proteomics GitHub wiki for usage, tutorials, and more!

Downloads

https://github.com/PNNL-Comp-Mass-Spec/Informed-Proteomics/releases

Continuous Integration

The latest versions of the Informed Proteomics tools may be available on the AppVeyor CI server, though they get auto-deleted after 6 months.

Build status

MSPathFinderT

MSPathFinder finds peptides in top-down LC-MS/MS datasets. Similar to database search engines for bottom-up, it takes a FASTA file, a spectrum file, and a list of modifications as an input and reports proteoform spectrum matches (PsSMs) and their scores. These results are output in a tab-separated format and in a MzIdentML file.

Processing steps:

  1. Run PbfGen.exe to convert the instrument file or .mzML file to an optimized binary file

    • Creates a .pbf file This file contains spectra information and full chromatograms for MS1 and MSn data, allowing fast access to extracted ion chromatograms during the search.
    • This step usually only needs to be performed once for a dataset, and by default will do nothing if it has previously been run on the dataset and can find the file.
    • ProMex and MSPathFinderT will perform this step automatically if they are given a spectrum file that is not a .pbf file.
  2. Run ProMex on the .Pbf file to deisotope the data, including determine charge states

    • Creates a .ms1ft file
    • This file can be reused for multiple searches, as long as none of its parameters change
    • MSPathFinderT will perform this step automatically if not provided with a path to a feature file, using the respective parameters from the MSPathFinderT parameters.
  3. Run MSPathfinderT with the .Pbf (or .raw) file and a FASTA file to search for proteins

    • Creates _IcTda.tsv files and .mzid
    • Can also be run with the spectrum source file directly, with PbfGen and ProMex run as part of the process.
  4. Optionally use ProMexAlign to align MS1 features between datasets

    • Creates a .tsv file listing a consolidated list of MS1 features, plus the corresponding Feature IDs from the input files
    • A FeatureID of 0 means the feature was not present

Example command lines:

PbfGen.exe -s MyDataset.raw

ProMex.exe -i MyDataset.pbf -minCharge 2 -maxCharge 60 -minMass 3000 -maxMass 50000 -score n -csv n -maxThreads 0

MSPathFinderT.exe -s MyDataset.pbf -feature MyDataset.ms1ft -d C:\FASTA\ProteinList.fasta -o C:\WorkDir -t 10 -f 10 -m 1 -tda 1 -minLength 21 -maxLength 300 -minCharge 2 -maxCharge 30 -minFragCharge 1 -maxFragCharge 15 -minMass 3000 -maxMass 50000 -mod MSPathFinder_Mods.txt

Results viewer / GUI

For viewing search results, you might want to consider LCMS-Spectator. It can also function as a GUI for running ProMex and MSPathFinder.

Running on Linux

PbfGen, ProMex, and MSPathFinderT can be run on Linux using Mono

Example command lines:

mono PbfGen.exe -s *.raw
mono PbfGen.exe -s *.mzML
mono PbfGen.exe -s *.mzML.gz

mono ProMex.exe -i *.pbf -minCharge 2 -maxCharge 60 -minMass 2000 -maxMass 50000 -score n -csv n -maxThreads 0

mono MSPathFinder/MSPathFinderT.exe -s *.pbf -d ID_006407_8F27399B.fasta -o . -ParamFile MSPF_MetOx_CysDehydro_NTermAcet_SingleInternalCleavage.txt

PbfGen Syntax

PbfGen.exe
    -s:RawFilePath (*.raw or directory)
    [-o:OutputDir]
    [-start:StartScan] [-end:EndScan]

-i or -s or -InputFile

-o or -OutputDirectory

-start

-end

ProMex Syntax

ProMex.exe 
  -i:InputFile [-o:OutputDirectory]
  [-MinCharge:Value] [-MaxCharge:Value]
  [-MinMass:Value]   [-MaxMass:Value]
  [-FeatureMap:Flag] [-Score:Flag]
  [MaxThreads:Value] [-csv:Flag]
  [BinResPPM:Value]  [ScoreTh:Value]

-i or -s or -InputFile

-o or -OutputDirectory

-MinCharge

-MaxCharge

-MinMass

-MaxMass

-FeatureMap

-Score

-MaxThreads

-csv

-BinResPPM or -BinningResolutionPPM

-ScoreTh or -ScoreThreshold

-ms1ft

-ParamFile

-CreateParamFile

Create PNG using existing ms1ft file

Command to create a PNG of the features in an existing ms1ft file (requires both a .pbf file and a .ms1ft file):

ProMex.exe -i dataset.pbf -ms1ft dataset.ms1ft -featureMap

ProMexAlign syntax

ProMexAlign.exe DatasetInfoFile

Example input file:

Label RawFilePath Ms1FtFilePath MsPathfinderIdFilePath
Intact_Run8 Intact_100ng_Run8.pbf Intact_100ng_Run8.ms1ft
Intact_Run9 Intact_100ng_Run9.pbf Intact_100ng_Run9.ms1ft
Intact_Run10 Intact_100ng_Run10.pbf Intact_100ng_Run10.ms1ft

MSPathFinder Syntax

MSPathFinderT.exe
  -i:InputFile [-d:FastaFile] 
  [-o:OutputDirectory] [-overwrite:Flag]
  [-mod:ModificationFilePath]
  [-ic:InternalCleavageMode]
  [-tda:DatabaseSearchMode]
  [-tagSearch:Flag]
  [-t:PrecursorTolerance] 
  [-f:FragmentIonTolerance]
  [-MemMatches:Count] 
  [-n:NumMatchesPerSpec]
  [-IncludeDecoys:Flag] 
  [-MinLength:Value] [-MaxLength:Value]
  [-MinCharge:Value] [-MaxCharge:Value]
  [-MinFragCharge:Value] [-MaxFragCharge:Value]
  [-MinMass:Value] [-MaxMass:Value]
  [-Feature:FeatureFile]
  [-Threads:Value]
  [-act:ActivationMethod]
  [-ScansFile:MS2ScanFilterFile]
  [-flip:Flag]
  [-ParamFile]
  [-CreateParamFile]

-i or -s or -specFile

-d or -database

-o or -outputDir

-ic

-TagSearch

-MemMatches

-n or -NumMatchesPerSpec

-IncludeDecoy or -IncludeDecoys

-mod

-tda

-Overwrite

-t or -PrecursorTol or -PMTolerance

-f or -FragmentTol or -FragTolerance

-MinLength

-MaxLength

-MinCharge

-MaxCharge

-MinFragCharge

-MaxFragCharge

-MinMass

-MaxMass

-Feature

-threads

-act or -ActivationMethod

-ScansFile or -ScansFilePath

-Flip

-ParamFile

-CreateParamFile

Enabling tag-based searching with -tagSearch 1 can give 5% to 10% more matches, but can increase the runtime by 30% to 50%.

Supported file formats

These suggested input file format is centroided .mzML files.

For versions released after February 1, 2019:

For versions released prior to February 1, 2019:

On Windows, several other formats are supported if an appropriate version of ProteoWizard is installed (Download here, make sure the version downloaded matches system architecture)

On Linux, supported input file formats are .raw, .mzML, and .mzML.gz

MSPathFinder Parameter Files

See the Example_Files directory on GitHub for sample parameter files

MSPathFinder Mods File

See the Example_Files directory for sample modification definition files.

System Requirements

Minimum required:

Minimum recommended:

Contacts

Written by Sangtae Kim, Junkap Park, and Chris Wilkins for the Department of Energy (PNNL, Richland, WA)\ Copyright 2015, Battelle Memorial Institute. All Rights Reserved.\ E-mail: matthew.monroe@pnnl.gov or proteomics@pnnl.gov\ Website: https://github.com/PNNL-Comp-Mass-Spec/ or https://panomics.pnnl.gov/ or https://www.pnnl.gov/integrative-omics/

License

Licensed under the Apache License, Version 2.0; you may not use this program except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

RawFileReader reading tool. Copyright © 2016 by Thermo Fisher Scientific, Inc. All rights reserved.