Nesvilab / philosopher

PeptideProphet, PTMProphet, ProteinProphet, iProphet, Abacus, and FDR filtering
https://philosopher.nesvilab.org
GNU General Public License v3.0
109 stars 17 forks source link

TMT Pipeline Error in Protein Prophet due to Peptide Prophet 'no data' #140

Closed ciarajudge closed 4 years ago

ciarajudge commented 4 years ago

Hi, further to another issue I was discussing which I now believe to be resolved, I am using the TMT pipeline to analyse databases that I am downloading from PRIDE and converting to mzML using MSConvert. After the operation of PeptideProphet (during which I get a number of warnings about failed mixture model quality tests), ProteinProphet fails, citing a suggestion that PeptideProphet did not run correctly or at all.

I am new to this type of analysis so I assume it is something I am doing wrong, any direction would be appreciated.

This is the complete report from the linux command line:

INFO[15:18:26] Executing Workspace  v3.2.7                  
INFO[15:18:26] Creating workspace                           
WARN[15:18:26] A meta data folder was found and will not be overwritten.  
INFO[15:18:26] Done                                         
INFO[15:18:26] Executing Pipeline  v3.2.7                   
INFO[15:18:26] Initiating the workspace on PXD019087_0      
INFO[15:18:26] Creating workspace                           
INFO[15:18:26] Processing database                          
INFO[15:18:29] Running the Database Search on all data      
MSFragger version MSFragger-3.0
Batmass-IO version 1.17.4
(c) University of Michigan
RawFileReader reading tool. Copyright (c) 2016 by Thermo Fisher Scientific, Inc. All rights reserved.
System OS: Linux, Architecture: amd64
Java Info: 1.8.0_201, Java HotSpot(TM) 64-Bit Server VM, Oracle Corporation
JVM started with 14 GB memory
Checking database...
Checking /home/DATA2/trips/scamp/PXD019087_0/TS_Miwi2-HA+RNAse_2.mzML...
Checking /home/DATA2/trips/scamp/PXD019087_0/TS_Miwi2-HA+RNAse_3.mzML...
Checking /home/DATA2/trips/scamp/PXD019087_0/TS_Miwi2-HA+RNase_1.mzML...

************************************MAIN SEARCH************************************
Checking database...
Parameters:
num_threads = 24
database_name = /home/DATA2/trips/scamp/proteomes/2020-06-24-decoys-contam-mus_musculus_proteome.fa
decoy_prefix = rev_
precursor_mass_lower = -20.0
precursor_mass_upper = 20.0
precursor_mass_units = 1
precursor_true_tolerance = 20.0
precursor_true_units = 1
fragment_mass_tolerance = 20.0
fragment_mass_units = 1
calibrate_mass = 0
write_calibrated_mgf = false
isotope_error = -1/0/1/2/3
mass_offsets = 0
labile_search_mode = OFF
precursor_mass_mode = SELECTED
localize_delta_mass = false
delta_mass_exclude_ranges = (-1.5, 3.5)
fragment_ion_series = b,y
diagnostic_intensity_filter = 0.0
Y_type_masses = 0/203.07937/406.15874/568.21156/730.26438/892.3172/349.137279
diagnostic_fragments = 204.086646/186.076086/168.065526/366.139466/144.0656/138.055/126.055/163.060096/512.197375/292.1026925/274.0921325/657.2349/243.026426/405.079246/485.045576/308.09761
search_enzyme_name = Trypsin
search_enzyme_cutafter = KR
search_enzyme_butnotafter = P
num_enzyme_termini = 2
allowed_missed_cleavage = 2
clip_nTerm_M = true
allow_multiple_variable_mods_on_residue = true
max_variable_mods_per_peptide = 3
max_variable_mods_combinations = 5000
output_file_extension = pepXML
output_format = pepXML
output_report_topN = 3
output_max_expect = 50.0
report_alternative_proteins = false
override_charge = false
precursor_charge_low = 1
precursor_charge_high = 6
digest_min_length = 7
digest_max_length = 50
digest_mass_range_low = 500.0
digest_mass_range_high = 5000.0
max_fragment_charge = 2
deisotope = 0
track_zero_topN = 0
zero_bin_accept_expect = 0.0
zero_bin_mult_expect = 1.0
add_topN_complementary = 0
minimum_peaks = 15
use_topN_peaks = 150
minIonsScoring = 3
min_matched_fragments = 4
minimum_ratio = 0.01
intensity_transform = 0
remove_precursor_peak = 0
remove_precursor_range = -1.5,1.5
clear_mz_range_low = 125.5
clear_mz_range_high = 131.5
excluded_scan_list_file = 
mass_diff_to_variable_mod = 0
variable_mod_01 = 15.99490 M 3
variable_mod_02 = 42.01060 [^ 1
variable_mod_03 = 229.162932 n^ 1
variable_mod_04 = 229.162932 S 1
add_A_alanine = 0.000000
add_C_cysteine = 57.021464
add_Cterm_peptide = 0.0
add_Cterm_protein = 0.0
add_D_aspartic_acid = 0.000000
add_E_glutamic_acid = 0.000000
add_F_phenylalanine = 0.000000
add_G_glycine = 0.000000
add_H_histidine = 0.000000
add_I_isoleucine = 0.000000
add_K_lysine = 229.162932
add_L_leucine = 0.000000
add_M_methionine = 0.000000
add_N_asparagine = 0.000000
add_Nterm_peptide = 0.0
add_Nterm_protein = 0.0
add_P_proline = 0.000000
add_Q_glutamine = 0.000000
add_R_arginine = 0.000000
add_S_serine = 0.000000
add_T_threonine = 0.000000
add_V_valine = 0.000000
add_W_tryptophan = 0.000000
add_Y_tyrosine = 0.000000
Selected fragment tolerance 0.10 Da.
4658294454 fragments to be searched in 8 slices (69.41 GB total)
Operating on slice 1 of 8: 
    Fragment index slice generated in 42.52 s
    001. TS_Miwi2-HA+RNAse_2.mzML 43.0 s
        [progress: 116032/116032 (100%) - 3357 spectra/s] 34.6s
    002. TS_Miwi2-HA+RNAse_3.mzML 27.0 s
        [progress: 64463/64463 (100%) - 4226 spectra/s] 15.3s
    003. TS_Miwi2-HA+RNase_1.mzML 27.0 s
        [progress: 58007/58007 (100%) - 5810 spectra/s] 10.0s
Operating on slice 2 of 8: 
    Fragment index slice generated in 12.36 s
    001. TS_Miwi2-HA+RNAse_2.mzML 38.6 s
        [progress: 116032/116032 (100%) - 4530 spectra/s] 25.6s
    002. TS_Miwi2-HA+RNAse_3.mzML 26.8 s
        [progress: 64463/64463 (100%) - 8171 spectra/s] 7.9s
    003. TS_Miwi2-HA+RNase_1.mzML 26.1 s
        [progress: 58007/58007 (100%) - 7921 spectra/s] 7.3s
Operating on slice 3 of 8: 
    Fragment index slice generated in 11.35 s
    001. TS_Miwi2-HA+RNAse_2.mzML 36.0 s
        [progress: 116032/116032 (100%) - 5307 spectra/s] 21.9s
    002. TS_Miwi2-HA+RNAse_3.mzML 26.7 s
        [progress: 64463/64463 (100%) - 8196 spectra/s] 7.9s
    003. TS_Miwi2-HA+RNase_1.mzML 26.9 s
        [progress: 58007/58007 (100%) - 8419 spectra/s] 6.9s
Operating on slice 4 of 8: 
    Fragment index slice generated in 13.53 s
    001. TS_Miwi2-HA+RNAse_2.mzML 37.4 s
        [progress: 116032/116032 (100%) - 5243 spectra/s] 22.1s
    002. TS_Miwi2-HA+RNAse_3.mzML 26.7 s
        [progress: 64463/64463 (100%) - 8522 spectra/s] 7.6s
    003. TS_Miwi2-HA+RNase_1.mzML 26.8 s
        [progress: 58007/58007 (100%) - 9244 spectra/s] 6.3s
Operating on slice 5 of 8: 
    Fragment index slice generated in 13.58 s
    001. TS_Miwi2-HA+RNAse_2.mzML 38.4 s
        [progress: 116032/116032 (100%) - 5067 spectra/s] 22.9s
    002. TS_Miwi2-HA+RNAse_3.mzML 26.3 s
        [progress: 64463/64463 (100%) - 9055 spectra/s] 7.1s
    003. TS_Miwi2-HA+RNase_1.mzML 26.8 s
        [progress: 58007/58007 (100%) - 10093 spectra/s] 5.7s
Operating on slice 6 of 8: 
    Fragment index slice generated in 12.96 s
    001. TS_Miwi2-HA+RNAse_2.mzML 38.9 s
        [progress: 116032/116032 (100%) - 5787 spectra/s] 20.1s
    002. TS_Miwi2-HA+RNAse_3.mzML 23.1 s
        [progress: 64463/64463 (100%) - 17479 spectra/s] 3.7s
    003. TS_Miwi2-HA+RNase_1.mzML 24.4 s
        [progress: 58007/58007 (100%) - 8882 spectra/s] 6.5s
Operating on slice 7 of 8: 
    Fragment index slice generated in 12.60 s
    001. TS_Miwi2-HA+RNAse_2.mzML 39.2 s
        [progress: 116032/116032 (100%) - 5469 spectra/s] 21.2s
    002. TS_Miwi2-HA+RNAse_3.mzML 25.9 s
        [progress: 64463/64463 (100%) - 9953 spectra/s] 6.5s
    003. TS_Miwi2-HA+RNase_1.mzML 26.1 s
        [progress: 58007/58007 (100%) - 8778 spectra/s] 6.6s
Operating on slice 8 of 8: 
    Fragment index slice generated in 13.02 s
    001. TS_Miwi2-HA+RNAse_2.mzML 38.5 s
        [progress: 116032/116032 (100%) - 5635 spectra/s] 20.6s | postprocessing 52.1 s
    002. TS_Miwi2-HA+RNAse_3.mzML 27.0 s
        [progress: 64463/64463 (100%) - 9607 spectra/s] 6.7s | postprocessing 26.3 s
    003. TS_Miwi2-HA+RNase_1.mzML 26.0 s
        [progress: 58007/58007 (100%) - 10636 spectra/s] 5.5s | postprocessing 27.1 s
***************************MAIN SEARCH DONE IN 21.525 MIN***************************

*******************************TOTAL TIME 21.688 MIN********************************
INFO[15:40:15] Running the validation and inference on PXD019087_0 
INFO[15:40:15] Executing PeptideProphet on PXD019087_0      
 file 1: /home/DATA2/trips/scamp/PXD019087_0/TS_Miwi2-HA+RNAse_2.pepXML
 file 2: /home/DATA2/trips/scamp/PXD019087_0/TS_Miwi2-HA+RNAse_3.pepXML
 file 3: /home/DATA2/trips/scamp/PXD019087_0/TS_Miwi2-HA+RNase_1.pepXML
 processed altogether 91871 results
INFO: Results written to file: /home/DATA2/trips/scamp/PXD019087_0/interact.pep.xml

  - /home/DATA2/trips/scamp/PXD019087_0/interact.pep.xml
  - Building Commentz-Walter keyword tree...
  - Searching the tree...
  - Linking duplicate entries...
  - Printing results...

using Accurate Mass Bins
using PPM mass difference
Using Decoy Label "rev_".
Decoy Probabilities will be reported.
Using non-parametric distributions
 (X! Tandem) (using Tandem's expectation score for modeling)
adding ACCMASS mixture distribution
using search_offsets in ACCMASS mixture distr: 0
init with X! Tandem trypsin 
MS Instrument info: Manufacturer: UNKNOWN, Model: UNKNOWN, Ionization: UNKNOWN, Analyzer: UNKNOWN, Detector: UNKNOWN

INFO: Processing standard MixtureModel ... 
 PeptideProphet  (TPP v5.2.1-dev Flammagenitus, Build 201906251008-exported (Linux-x86_64)) AKeller@ISB
 read in 0 1+, 32020 2+, 50838 3+, 7804 4+, 966 5+, 230 6+, and 13 7+ spectra.
Initialising statistical models ...
Found 42793 Decoys, and 49078 Non-Decoys
Iterations: .........10.........20.........30.
WARNING: Mixture model quality test failed for charge (1+).
WARNING: Mixture model quality test failed for charge (2+).
WARNING: Mixture model quality test failed for charge (3+).
WARNING: Mixture model quality test failed for charge (4+).
WARNING: Mixture model quality test failed for charge (5+).
WARNING: Mixture model quality test failed for charge (6+).
WARNING: Mixture model quality test failed for charge (7+).
model complete after 32 iterations
INFO[15:44:41] Creating combined protein inference          
ProteinProphet (C++) by Insilicos LLC and LabKey Software, after the original Perl by A. Keller (TPP v5.2.1-dev Flammagenitus, Build 201906251008-exported (Linux-x86_64))
 (no FPKM) (using degen pep info)
Reading in /home/DATA2/trips/scamp/PXD019087_0/interact.pep.xml...
did not find any PeptideProphet results in input data!  Did you forget to run PeptideProphet?
...read in 0 1+, 0 2+, 0 3+, 0 4+, 0 5+, 0 6+, 0 7+ spectra with min prob 0.05

WARNING: no data - output file will be empty
FATA[15:44:41] Cannot execute program. There was an error with ProteinProphet, please check your parameters and input files 

This is my parameter file:

# Philosopher pipeline configuration file.
#
# The pipeline mode automates the processing done by Philosopher. First, check
# the steps you want to execute in the commands section and change them to
# 'yes'. For each selected command, go to its section and adjust the parameters
# accordingly to your analysis.
#
# If you want to include MSFragger and TMT-Integrator into your analysis, you will
# haver o download them separately and then add their location tot their configuration
#
# Usage:
# philosopher pipeline --config <this_configuration_file> [list_of_data_set_folders]

analytics: true                                # reports when a workspace is created for usage estimation (default true)
slackToken:                                    # specify the Slack API token
slackChannel:                                  # specify the channel name

commands:
  workspace: yes                               # manage the experiment workspace for the analysis
  database: yes                                # target-decoy database formatting
  comet: no                                    # peptide spectrum matching with Comet
  msfragger: yes                               # peptide spectrum matching with MSFragger
  peptideprophet: yes                          # peptide assignment validation
  ptmprophet: no                               # PTM site localization
  proteinprophet: no                           # protein identification validation
  filter: yes                                  # statistical filtering, validation and False Discovery Rates assessment
  freequant: no                               # label-free Quantification
  labelquant: no                              # isobaric Labeling-Based Relative Quantification
  bioquant: no                                 # protein report based on Uniprot protein clusters
  report: yes                                  # multi-level reporting for both narrow-searches and open-searches
  abacus: yes                                  # combined analysis of LC-MS/MS results
  tmtintegrator: no                           # integrates channel abundances from multiple TMT samples

database:
  protein_database: /home/DATA2/trips/scamp/proteomes/2020-06-24-decoys-contam-mus_musculus_proteome.fa   # path to the target-decoy protein database
  decoy_tag: rev_                              # prefix tag used added to decoy sequences

comet:
  noindex: true                                # skip raw file indexing
  param:                                       # comet parameter file (default "comet.params.txt")
  raw: mzML                                    # format of the spectra file

msfragger:                                     # v2.3
  path: /home/DATA2/trips/scamp/MSFragger/MSFragger.jar  # path to MSFragger jar
  memory: 16                                   # how much memory in GB to use
  param:                                       # MSFragger parameter file
  raw: mzML                                    # spectra format
  num_threads: 0                               # 0=poll CPU to set num threads; else specify num threads directly (max 64)
  precursor_mass_lower: -20                    # lower bound of the precursor mass window
  precursor_mass_upper: 20                     # upper bound of the precursor mass window
  precursor_mass_units: 1                      # 0=Daltons, 1=ppm
  precursor_true_tolerance: 20                 # true precursor mass tolerance (window is +/- this value)
  precursor_true_units: 1                      # 0=Daltons, 1=ppm
  fragment_mass_tolerance: 20                  # fragment mass tolerance (window is +/- this value)
  fragment_mass_units: 1                       # fragment mass tolerance units (0 for Da, 1 for ppm)
  calibrate_mass: 0                            # 0=Off, 1=On, 2=On and find optimal parameters
  deisotope: 0                                 # activates deisotoping.
  isotope_error: -1/0/1/2/3                    # 0=off, -1/0/1/2/3 (standard C13 error)
  mass_offsets: 0                              # allow for additional precursor mass window shifts. Multiplexed with isotope_error. mass_offsets = 0/79.966 can be use$
  precursor_mass_mode: selected                # selected or isolated
  localize_delta_mass: 0                       # this allows shifted fragment ions - fragment ions with mass increased by the calculated mass difference, to be includ$
  delta_mass_exclude_ranges: (-1.5,3.5)        # exclude mass range for shifted ions searching
  fragment_ion_series: b,y                     # ion series used in search
  search_enzyme_name: Trypsin                  # name of enzyme to be written to the pepXML file
  search_enzyme_cutafter: KR                   # residues after which the enzyme cuts
  search_enzyme_butnotafter: P                 # residues that the enzyme will not cut before
  num_enzyme_termini: 2                        # 2 for enzymatic, 1 for semi-enzymatic, 0 for nonspecific digestion
  allowed_missed_cleavage: 2                   # maximum value is 5
  clip_nTerm_M: 1                              # specifies the trimming of a protein N-terminal methionine as a variable modification (0 or 1)
  variable_mod_01: 15.99490 M 3                # variable modification
  variable_mod_02: 42.01060 [^ 1               # variable modification
  variable_mod_03: 229.162932 n^ 1             # variable modification
  variable_mod_04: 229.162932 S 1              # variable modification
  variable_mod_05:                             # variable modification
  variable_mod_06:                             # variable modification
  variable_mod_07:                             # variable modification
  allow_multiple_variable_mods_on_residue: 1   # static mods are not considered
  max_variable_mods_per_peptide: 3             # maximum of 5
  max_variable_mods_combinations: 5000         # maximum of 65534, limits number of modified peptides generated from sequence
  output_file_extension: pepXML                # file extension of output files
  output_format: pepXML                        # file format of output files (pepXML or tsv)
  output_report_topN: 3                        # reports top N PSMs per input spectrum
  output_max_expect: 50                        # suppresses reporting of PSM if top hit has expectation greater than this threshold
  report_alternative_proteins: 0               # 0=no, 1=yes
  precursor_charge: 1 6                        # assume range of potential precursor charge states. Only relevant when override_charge is set to 1
  override_charge: 0                           # 0=no, 1=yes to override existing precursor charge states with precursor_charge parameter
  digest_min_length: 7                         # minimum length of peptides to be generated during in-silico digestion
  digest_max_length: 50                        # maximum length of peptides to be generated during in-silico digestion
  digest_mass_range: 500.0 5000.0              # mass range of peptides to be generated during in-silico digestion in Daltons
  max_fragment_charge: 2                       # maximum charge state for theoretical fragments to match (1-4)
  track_zero_topN: 0                           # in addition to topN results, keep track of top results in zero bin
  zero_bin_accept_expect: 0                    # boost top zero bin entry to top if it has expect under 0.01 - set to 0 to disable
  zero_bin_mult_expect: 1                      # disabled if above passes - multiply expect of zero bin for ordering purposes (does not affect reported expect)
  add_topN_complementary: 0                    # inserts complementary ions corresponding to the top N most intense fragments in each experimental spectra
  minimum_peaks: 15                            # required minimum number of peaks in spectrum to search (default 10)
  use_topN_peaks: 150                          # pre-process experimental spectrum to only use top N peaks
  min_fragments_modelling: 3                   # minimum number of matched peaks in PSM for inclusion in statistical modeling
  min_matched_fragments: 4                     # minimum number of matched peaks for PSM to be reported
  minimum_ratio: 0.01                          # filters out all peaks in experimental spectrum less intense than this multiple of the base peak intensity
  clear_mz_range: 125.5 131.5                  # for iTRAQ/TMT type data; will clear out all peaks in the specified m/z range
  remove_precursor_peak: 0                     # remove precursor peaks from tandem mass spectra. 0=not remove; 1=remove the peak with precursor charge; 2=remove the $
  remove_precursor_range: -1.5,1.5             # m/z range in removing precursor peaks. Unit: Da.
  intensity_transform: 0                       # transform peaks intensities with sqrt root. 0=not transform; 1=transform using sqrt root.
  add_Cterm_peptide: 0.000000                  # c-term peptide fixed modifications
  add_Cterm_protein: 0.000000                  # c-term protein fixed modifications
  add_Nterm_peptide: 0.000000                  # n-term peptide fixed modifications
  add_Nterm_protein: 0.000000                  # n-term protein fixed modifications
  add_A_alanine: 0.000000                      # alanine fixed modifications
  add_C_cysteine: 57.021464                    # cysteine fixed modifications
  add_D_aspartic_acid: 0.000000                # aspartic acid fixed modifications
  add_E_glutamic_acid: 0.000000                # glutamic acid fixed modifications
  add_F_phenylalanine: 0.000000                # phenylalanine fixed modifications
  add_G_glycine: 0.000000                      # glycine fixed modifications
  add_H_histidine: 0.000000                    # histidine fixed modifications
  add_I_isoleucine: 0.000000                   # isoleucine fixed modifications
  add_K_lysine: 229.162932                     # lysine fixed modifications
  add_L_leucine: 0.000000                      # leucine fixed modifications
  add_M_methionine: 0.000000                   # methionine fixed modifications
  add_N_asparagine: 0.000000                   # asparagine fixed modifications
  add_P_proline: 0.000000                      # proline fixed modifications
  add_Q_glutamine: 0.000000                    # glutamine fixed modifications
  add_R_arginine: 0.000000                     # arginine fixed modifications
  add_S_serine: 0.000000                       # serine fixed modifications
  add_T_threonine: 0.000000                    # threonine fixed modifications
  add_V_valine: 0.000000                       # valine fixed modifications
  add_W_tryptophan: 0.000000                   # tryptophan fixed modifications
  add_Y_tyrosine: 0.000000                     # tyrosine fixed modifications

peptideprophet:                                # v5.2
  extension: pepXML                            # pepXML file extension
  clevel: 0                                    # set Conservative Level in neg_stdev from the neg_mean, low numbers are less conservative, high numbers are more conse$
  accmass: true                                # use Accurate Mass model binning
  decoyprobs: true                             # compute possible non-zero probabilities for Decoy entries on the last iteration
  enzyme: trypsin                              # enzyme used in sample (optional)
  exclude: false                               # exclude deltaCn*, Mascot*, and Comet* results from results (default Penalize * results)
  expectscore: true                            # use expectation value as the only contributor to the f-value for modeling
  forcedistr: false                            # bypass quality control checks, report model despite bad modeling
  glyc: false                                  # enable peptide Glyco motif model
  icat: false                                  # apply ICAT model (default Autodetect ICAT)
  instrwarn: false                             # warn and continue if combined data was generated by different instrument models
  leave: false                                 # leave alone deltaCn*, Mascot*, and Comet* results from results (default Penalize * results)
  maldi: false                                 # enable MALDI mode
  masswidth: 5                                 # model mass width (default 5)
  minpeplen: 7                                 # minimum peptide length not rejected (default 7)
  minpintt: 2                                  # minimum number of NTT in a peptide used for positive pI model (default 2)
  minpiprob: 0.9                               # minimum probability after first pass of a peptide used for positive pI model (default 0.9)
  minprob: 0.05                                # report results with minimum probability (default 0.05)
  minrtntt: 2                                  # minimum number of NTT in a peptide used for positive RT model (default 2)
  minrtprob: 0.9                               # minimum probability after first pass of a peptide used for positive RT model (default 0.9)
  neggamma: false                              # use Gamma distribution to model the negative hits
  noicat: false                                # do no apply ICAT model (default Autodetect ICAT)
  nomass: false                                # disable mass model
  nonmc: false                                 # disable NMC missed cleavage model
  nonparam: true                               # use semi-parametric modeling, must be used in conjunction with --decoy option
  nontt: false                                 # disable NTT enzymatic termini model
  optimizefval: false                          # (SpectraST only) optimize f-value function f(dot,delta) using PCA
  phospho: false                               # enable peptide Phospho motif model
  pi: false                                    # enable peptide pI model
  ppm: true                                    # use PPM mass error instead of Dalton for mass modeling
  zero: false                                  # report results with minimum probability 0

ptmprophet:                                    # v5.2
  autodirect: false                            # use direct evidence when the lability is high, use in combination with LABILITY
  cions:                                       # use specified C-term ions, separate multiple ions by commas (default: y for CID, z for ETD)
  direct: false                                # use only direct evidence for evaluating PTM site probabilities
  em: 2                                        # set EM models to 0 (no EM), 1 (Intensity EM Model Applied) or 2 (Intensity and Matched Peaks EM Models Applied)
  static: false                                # use static fragppmtol for all PSMs instead of dynamically estimates offsets and tolerances
  fragppmtol: 15                               # when computing PSM-specific mass_offset and mass_tolerance, use specified default +/- MS2 mz tolerance on fragment io$
  ifrags: false                                # use internal fragments for localization
  keepold: false                               # retain old PTMProphet results in the pepXML file
  lability: false                              # compute Lability of PTMs
  massdiffmode: false                          # use the Mass Difference and localize
  massoffset: 0                                # adjust the massdiff by offset (0 = use default)
  maxfragz: 0                                  # limit maximum fragment charge (default: 0=precursor charge, negative values subtract from precursor charge)
  maxthreads: 4                                # use specified number of threads for processing
  mino: 0                                      # use specified number of pseudo-counts when computing Oscore (0 = use default)
  minprob: 0                                   # use specified minimum probability to evaluate peptides
  mods:                                        # specify modifications
  nions:                                       # use specified N-term ions, separate multiple ions by commas (default: a,b for CID, c for ETD)
  nominofactor: false                          # disable MINO factor correction when MINO= is set greater than 0 (default: apply MINO factor correction)
  ppmtol: 1                                    # use specified +/- MS1 ppm tolerance on peptides which may have a slight offset depending on search parameters
  verbose: false                               # produce Warnings to help troubleshoot potential PTM shuffling or mass difference issues

proteinprophet:                                # v5.2
  accuracy: false                              # equivalent to --minprob 0
  allpeps: false                               # consider all possible peptides in the database in the confidence model
  confem: false                                # use the EM to compute probability given the confidence
  delude: false                                # do NOT use peptide degeneracy information when assessing proteins
  excludezeros: false                          # exclude zero prob entries
  fpkm: false                                  # model protein FPKM values
  glyc: false                                  # highlight peptide N-glycosylation motif
  icat: false                                  # highlight peptide cysteines
  instances: false                             # use Expected Number of Ion Instances to adjust the peptide probabilities prior to NSP adjustment
  iprophet: false                              # input is from iProphet
  logprobs: false                              # use the log of the probabilities in the Confidence calculations
  maxppmdiff: 20                               # maximum peptide mass difference in PPM (default 20)
  minprob: 0.05                                # peptideProphet probabilty threshold (default 0.05)
  mufactor: 1                                  # fudge factor to scale MU calculation (default 1)
  nogroupwts: false                            # check peptide's Protein weight against the threshold (default: check peptide's Protein Group weight against threshold)
  nonsp: false                                 # do not use NSP model
  nooccam: false                               # non-conservative maximum protein list
  noprotlen: false                             # do not report protein length
  normprotlen: false                           # normalize NSP using Protein Length
  protmw: false                                # get protein mol weights
  softoccam: false                             # peptide weights are apportioned equally among proteins within each Protein Group (less conservative protein count est$
  unmapped: false                              # report results for UNMAPPED proteins

filter:
  psmFDR: 0.01                                 # psm FDR level (default 0.01)
  peptideFDR: 0.01                             # peptide FDR level (default 0.01)
  ionFDR: 0.01                                 # peptide ion FDR level (default 0.01)
  proteinFDR: 0.01                             # protein FDR level (default 0.01)
  peptideProbability: 0.7                      # top peptide probability threshold for the FDR filtering (default 0.7)
  proteinProbability: 0.5                      # protein probability threshold for the FDR filtering (not used with the razor algorithm) (default 0.5)
  peptideWeight: 0.9                           # threshold for defining peptide uniqueness (default 1)
  razor: true                                  # use razor peptides for protein FDR scoring
  picked: true                                 # apply the picked FDR algorithm before the protein scoring
  mapMods: true                                # map modifications acquired by an open search
  models: true                                 # print model distribution
  sequential: true                             # alternative algorithm that estimates FDR using both filtered PSM and Protein lists

freequant:
  peakTimeWindow: 0.4                          # specify the time windows for the peak (minute) (default 0.4)
  retentionTimeWindow: 3                       # specify the retention time window for xic (minute) (default 3)
  tolerance: 10                                # m/z tolerance in ppm (default 10)
  isolated: true                               # use the isolated ion instead of the selected ion for quantification

labelquant:
  annotation: annotation.txt                   # annotation file with custom names for the TMT channels
  bestPSM: true                                # select the best PSMs for protein quantification
  level: 2                                     # ms level for the quantification
  minProb: 0.7                                 # only use PSMs with a minimum probability score
  plex: 10                                     # number of channels
  purity: 0.5                                  # ion purity threshold (default 0.5)
  removeLow: 0.05                              # ignore the lower 3% PSMs based on their summed abundances
  tolerance: 20                                # m/z tolerance in ppm (default 20)
  uniqueOnly: false                            # report quantification based on only unique peptides

report:
  msstats: false                               # create an output compatible to MSstats
  withDecoys: false                            # add decoy observations to reports
  mzID: false                                  # create a mzID output

bioquant:
  organismUniProtID:                           # UniProt proteome ID
  level: 0.9                                   # cluster identity level (default 0.9)

abacus:
  protein: true                                # global level protein report
  peptide: false                               # global level peptide report
  proteinProbability: 0.05                     # minimum protein probability (default 0.9)
  peptideProbability: 0.5                      # minimum peptide probability (default 0.5)
  uniqueOnly: false                            # report TMT quantification based on only unique peptides
  reprint: false                               # create abacus reports using the Reprint format

tmtintegrator:                                 # v1.1.2
  path:                                        # path to TMT-Integrator jar
  memory: 100                                  # memory allocation, in Gb
  output:                                      # the location of output files
  channel_num: 10                              # number of channels in the multiplex (e.g. 10, 11)
  ref_tag: pool                                # unique tag for identifying the reference channel (Bridge sample added to each multiplex)
  groupby: -1                                  # level of data summarization(0: PSM aggregation to the gene level; 1: protein; 2: peptide sequence; 3: PTM site; -1: g$
  psm_norm: false                              # perform additional retention time-based normalization at the PSM level
  outlier_removal: true                        # perform outlier removal
  prot_norm: -1                                # normalization (0: None; 1: MD (median centering); 2: GN (median centering + variance scaling); -1: generate reports w$
  min_pep_prob: 0.9                            # minimum PSM probability threshold (in addition to FDR-based filtering by Philosopher)
  min_purity: 0.5                              # ion purity score threshold
  min_percent: 0.05                            # remove low intensity PSMs (e.g. value of 0.05 indicates removal of PSMs with the summed TMT reporter ions intensity i$
  unique_pep: false                            # allow PSMs with unique peptides only (if true) or unique plus razor peptides (if false), as classified by Philosopher$
  unique_gene: 0                               # additional, gene-level uniqueness filter (0: allow all PSMs; 1: remove PSMs mapping to more than one GENE with eviden$
  best_psm: true                               # keep the best PSM only (highest summed TMT intensity) among all redundant PSMs within the same LC-MS run
  prot_exclude: sp|,tr|                        # exclude proteins with specified tags at the beginning of the accession number (e.g. none: no exclusion; sp|,tr| : exc$
  allow_overlabel: false                       # allow PSMs with TMT on S (when overlabeling on S was allowed in the database search)
  allow_unlabeled: false                       # allow PSMs without TMT tag or acetylation on the peptide n-terminus
  mod_tag: none                                # PTM info for generation of PTM-specific reports (none: for Global data; S(79.9663),T(79.9663),Y(79.9663): for Phospho$
  min_site_prob: -1                            # site localization confidence threshold (-1: for Global; 0: as determined by the search engine; above 0 (e.g. 0.75): P$
  ms1_int: true                                # use MS1 precursor ion intensity (if true) or MS2 summed TMT reporter ion intensity (if false) as part of the referenc$
  top3_pep: true                               # use top 3 most intense peptide ions as part of the reference sample abundance estimation
  print_RefInt: false                          # print individual reference sample abundance estimates for each multiplex in the final reports (in addition to the com$
anesvi commented 4 years ago

PeptideProphet modeling failed because you do not have enough high confidence PSMs identified by MsFragger. Something wrong with your MSFragegr search parameters. Is it MS3 TMT data? If so (as I suspect), you need to change precursor tolerance from -+20ppm to 0.6 Da

From: Ciara Judge notifications@github.com Sent: Thursday, June 25, 2020 11:14 AM To: Nesvilab/philosopher philosopher@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [Nesvilab/philosopher] TMT Pipeline Error in Protein Prophet due to Peptide Prophet 'no data' (#140)

External Email - Use Caution

Hi, further to another issuehttps://github.com/Nesvilab/philosopher/issues/139 I was discussing which I now believe to be resolved, I am using the TMT pipeline to analyse databases that I am downloading from PRIDE and converting to mzML using MSConvert. After the operation of PeptideProphet (during which I get a number of warnings about failed mixture model quality tests), ProteinProphet fails, citing a suggestion that PeptideProphet did not run correctly or at all.

I am new to this type of analysis so I assume it is something I am doing wrong, any direction would be appreciated.

This is the complete report from the linux command line:

INFO[15:18:26] Executing Workspace v3.2.7

INFO[15:18:26] Creating workspace

WARN[15:18:26] A meta data folder was found and will not be overwritten.

INFO[15:18:26] Done

INFO[15:18:26] Executing Pipeline v3.2.7

INFO[15:18:26] Initiating the workspace on PXD019087_0

INFO[15:18:26] Creating workspace

INFO[15:18:26] Processing database

INFO[15:18:29] Running the Database Search on all data

MSFragger version MSFragger-3.0

Batmass-IO version 1.17.4

(c) University of Michigan

RawFileReader reading tool. Copyright (c) 2016 by Thermo Fisher Scientific, Inc. All rights reserved.

System OS: Linux, Architecture: amd64

Java Info: 1.8.0_201, Java HotSpot(TM) 64-Bit Server VM, Oracle Corporation

JVM started with 14 GB memory

Checking database...

Checking /home/DATA2/trips/scamp/PXD019087_0/TS_Miwi2-HA+RNAse_2.mzML...

Checking /home/DATA2/trips/scamp/PXD019087_0/TS_Miwi2-HA+RNAse_3.mzML...

Checking /home/DATA2/trips/scamp/PXD019087_0/TS_Miwi2-HA+RNase_1.mzML...

****MAIN SEARCH****

Checking database...

Parameters:

num_threads = 24

database_name = /home/DATA2/trips/scamp/proteomes/2020-06-24-decoys-contam-mus_musculus_proteome.fa

decoyprefix = rev

precursor_mass_lower = -20.0

precursor_mass_upper = 20.0

precursor_mass_units = 1

precursor_true_tolerance = 20.0

precursor_true_units = 1

fragment_mass_tolerance = 20.0

fragment_mass_units = 1

calibrate_mass = 0

write_calibrated_mgf = false

isotope_error = -1/0/1/2/3

mass_offsets = 0

labile_search_mode = OFF

precursor_mass_mode = SELECTED

localize_delta_mass = false

delta_mass_exclude_ranges = (-1.5, 3.5)

fragment_ion_series = b,y

diagnostic_intensity_filter = 0.0

Y_type_masses = 0/203.07937/406.15874/568.21156/730.26438/892.3172/349.137279

diagnostic_fragments = 204.086646/186.076086/168.065526/366.139466/144.0656/138.055/126.055/163.060096/512.197375/292.1026925/274.0921325/657.2349/243.026426/405.079246/485.045576/308.09761

search_enzyme_name = Trypsin

search_enzyme_cutafter = KR

search_enzyme_butnotafter = P

num_enzyme_termini = 2

allowed_missed_cleavage = 2

clip_nTerm_M = true

allow_multiple_variable_mods_on_residue = true

max_variable_mods_per_peptide = 3

max_variable_mods_combinations = 5000

output_file_extension = pepXML

output_format = pepXML

output_report_topN = 3

output_max_expect = 50.0

report_alternative_proteins = false

override_charge = false

precursor_charge_low = 1

precursor_charge_high = 6

digest_min_length = 7

digest_max_length = 50

digest_mass_range_low = 500.0

digest_mass_range_high = 5000.0

max_fragment_charge = 2

deisotope = 0

track_zero_topN = 0

zero_bin_accept_expect = 0.0

zero_bin_mult_expect = 1.0

add_topN_complementary = 0

minimum_peaks = 15

use_topN_peaks = 150

minIonsScoring = 3

min_matched_fragments = 4

minimum_ratio = 0.01

intensity_transform = 0

remove_precursor_peak = 0

remove_precursor_range = -1.5,1.5

clear_mz_range_low = 125.5

clear_mz_range_high = 131.5

excluded_scan_list_file =

mass_diff_to_variable_mod = 0

variable_mod_01 = 15.99490 M 3

variable_mod_02 = 42.01060 [^ 1

variable_mod_03 = 229.162932 n^ 1

variable_mod_04 = 229.162932 S 1

add_A_alanine = 0.000000

add_C_cysteine = 57.021464

add_Cterm_peptide = 0.0

add_Cterm_protein = 0.0

add_D_aspartic_acid = 0.000000

add_E_glutamic_acid = 0.000000

add_F_phenylalanine = 0.000000

add_G_glycine = 0.000000

add_H_histidine = 0.000000

add_I_isoleucine = 0.000000

add_K_lysine = 229.162932

add_L_leucine = 0.000000

add_M_methionine = 0.000000

add_N_asparagine = 0.000000

add_Nterm_peptide = 0.0

add_Nterm_protein = 0.0

add_P_proline = 0.000000

add_Q_glutamine = 0.000000

add_R_arginine = 0.000000

add_S_serine = 0.000000

add_T_threonine = 0.000000

add_V_valine = 0.000000

add_W_tryptophan = 0.000000

add_Y_tyrosine = 0.000000

Selected fragment tolerance 0.10 Da.

4658294454 fragments to be searched in 8 slices (69.41 GB total)

Operating on slice 1 of 8:

    Fragment index slice generated in 42.52 s

    001. TS_Miwi2-HA+RNAse_2.mzML 43.0 s

           [progress: 116032/116032 (100%) - 3357 spectra/s] 34.6s

    002. TS_Miwi2-HA+RNAse_3.mzML 27.0 s

           [progress: 64463/64463 (100%) - 4226 spectra/s] 15.3s

    003. TS_Miwi2-HA+RNase_1.mzML 27.0 s

           [progress: 58007/58007 (100%) - 5810 spectra/s] 10.0s

Operating on slice 2 of 8:

    Fragment index slice generated in 12.36 s

    001. TS_Miwi2-HA+RNAse_2.mzML 38.6 s

           [progress: 116032/116032 (100%) - 4530 spectra/s] 25.6s

    002. TS_Miwi2-HA+RNAse_3.mzML 26.8 s

           [progress: 64463/64463 (100%) - 8171 spectra/s] 7.9s

    003. TS_Miwi2-HA+RNase_1.mzML 26.1 s

           [progress: 58007/58007 (100%) - 7921 spectra/s] 7.3s

Operating on slice 3 of 8:

    Fragment index slice generated in 11.35 s

    001. TS_Miwi2-HA+RNAse_2.mzML 36.0 s

           [progress: 116032/116032 (100%) - 5307 spectra/s] 21.9s

    002. TS_Miwi2-HA+RNAse_3.mzML 26.7 s

           [progress: 64463/64463 (100%) - 8196 spectra/s] 7.9s

    003. TS_Miwi2-HA+RNase_1.mzML 26.9 s

           [progress: 58007/58007 (100%) - 8419 spectra/s] 6.9s

Operating on slice 4 of 8:

    Fragment index slice generated in 13.53 s

    001. TS_Miwi2-HA+RNAse_2.mzML 37.4 s

           [progress: 116032/116032 (100%) - 5243 spectra/s] 22.1s

    002. TS_Miwi2-HA+RNAse_3.mzML 26.7 s

           [progress: 64463/64463 (100%) - 8522 spectra/s] 7.6s

    003. TS_Miwi2-HA+RNase_1.mzML 26.8 s

           [progress: 58007/58007 (100%) - 9244 spectra/s] 6.3s

Operating on slice 5 of 8:

    Fragment index slice generated in 13.58 s

    001. TS_Miwi2-HA+RNAse_2.mzML 38.4 s

           [progress: 116032/116032 (100%) - 5067 spectra/s] 22.9s

    002. TS_Miwi2-HA+RNAse_3.mzML 26.3 s

           [progress: 64463/64463 (100%) - 9055 spectra/s] 7.1s

    003. TS_Miwi2-HA+RNase_1.mzML 26.8 s

           [progress: 58007/58007 (100%) - 10093 spectra/s] 5.7s

Operating on slice 6 of 8:

    Fragment index slice generated in 12.96 s

    001. TS_Miwi2-HA+RNAse_2.mzML 38.9 s

           [progress: 116032/116032 (100%) - 5787 spectra/s] 20.1s

    002. TS_Miwi2-HA+RNAse_3.mzML 23.1 s

           [progress: 64463/64463 (100%) - 17479 spectra/s] 3.7s

    003. TS_Miwi2-HA+RNase_1.mzML 24.4 s

           [progress: 58007/58007 (100%) - 8882 spectra/s] 6.5s

Operating on slice 7 of 8:

    Fragment index slice generated in 12.60 s

    001. TS_Miwi2-HA+RNAse_2.mzML 39.2 s

           [progress: 116032/116032 (100%) - 5469 spectra/s] 21.2s

    002. TS_Miwi2-HA+RNAse_3.mzML 25.9 s

           [progress: 64463/64463 (100%) - 9953 spectra/s] 6.5s

    003. TS_Miwi2-HA+RNase_1.mzML 26.1 s

           [progress: 58007/58007 (100%) - 8778 spectra/s] 6.6s

Operating on slice 8 of 8:

    Fragment index slice generated in 13.02 s

    001. TS_Miwi2-HA+RNAse_2.mzML 38.5 s

           [progress: 116032/116032 (100%) - 5635 spectra/s] 20.6s | postprocessing 52.1 s

    002. TS_Miwi2-HA+RNAse_3.mzML 27.0 s

           [progress: 64463/64463 (100%) - 9607 spectra/s] 6.7s | postprocessing 26.3 s

    003. TS_Miwi2-HA+RNase_1.mzML 26.0 s

           [progress: 58007/58007 (100%) - 10636 spectra/s] 5.5s | postprocessing 27.1 s

MAIN SEARCH DONE IN 21.525 MIN

***TOTAL TIME 21.688 MIN****

INFO[15:40:15] Running the validation and inference on PXD019087_0

INFO[15:40:15] Executing PeptideProphet on PXD019087_0

file 1: /home/DATA2/trips/scamp/PXD019087_0/TS_Miwi2-HA+RNAse_2.pepXML

file 2: /home/DATA2/trips/scamp/PXD019087_0/TS_Miwi2-HA+RNAse_3.pepXML

file 3: /home/DATA2/trips/scamp/PXD019087_0/TS_Miwi2-HA+RNase_1.pepXML

processed altogether 91871 results

INFO: Results written to file: /home/DATA2/trips/scamp/PXD019087_0/interact.pep.xml

using Accurate Mass Bins

using PPM mass difference

Using Decoy Label "rev_".

Decoy Probabilities will be reported.

Using non-parametric distributions

(X! Tandem) (using Tandem's expectation score for modeling)

adding ACCMASS mixture distribution

using search_offsets in ACCMASS mixture distr: 0

init with X! Tandem trypsin

MS Instrument info: Manufacturer: UNKNOWN, Model: UNKNOWN, Ionization: UNKNOWN, Analyzer: UNKNOWN, Detector: UNKNOWN

INFO: Processing standard MixtureModel ...

PeptideProphet (TPP v5.2.1-dev Flammagenitus, Build 201906251008-exported (Linux-x86_64)) AKeller@ISB

read in 0 1+, 32020 2+, 50838 3+, 7804 4+, 966 5+, 230 6+, and 13 7+ spectra.

Initialising statistical models ...

Found 42793 Decoys, and 49078 Non-Decoys

Iterations: .........10.........20.........30.

WARNING: Mixture model quality test failed for charge (1+).

WARNING: Mixture model quality test failed for charge (2+).

WARNING: Mixture model quality test failed for charge (3+).

WARNING: Mixture model quality test failed for charge (4+).

WARNING: Mixture model quality test failed for charge (5+).

WARNING: Mixture model quality test failed for charge (6+).

WARNING: Mixture model quality test failed for charge (7+).

model complete after 32 iterations

INFO[15:44:41] Creating combined protein inference

ProteinProphet (C++) by Insilicos LLC and LabKey Software, after the original Perl by A. Keller (TPP v5.2.1-dev Flammagenitus, Build 201906251008-exported (Linux-x86_64))

(no FPKM) (using degen pep info)

Reading in /home/DATA2/trips/scamp/PXD019087_0/interact.pep.xml...

did not find any PeptideProphet results in input data! Did you forget to run PeptideProphet?

...read in 0 1+, 0 2+, 0 3+, 0 4+, 0 5+, 0 6+, 0 7+ spectra with min prob 0.05

WARNING: no data - output file will be empty

FATA[15:44:41] Cannot execute program. There was an error with ProteinProphet, please check your parameters and input files

This is my parameter file:

Philosopher pipeline configuration file.

#

The pipeline mode automates the processing done by Philosopher. First, check

the steps you want to execute in the commands section and change them to

'yes'. For each selected command, go to its section and adjust the parameters

accordingly to your analysis.

#

If you want to include MSFragger and TMT-Integrator into your analysis, you will

haver o download them separately and then add their location tot their configuration

#

Usage:

philosopher pipeline --config [list_of_data_set_folders]

analytics: true # reports when a workspace is created for usage estimation (default true)

slackToken: # specify the Slack API token

slackChannel: # specify the channel name

commands:

workspace: yes # manage the experiment workspace for the analysis

database: yes # target-decoy database formatting

comet: no # peptide spectrum matching with Comet

msfragger: yes # peptide spectrum matching with MSFragger

peptideprophet: yes # peptide assignment validation

ptmprophet: no # PTM site localization

proteinprophet: no # protein identification validation

filter: yes # statistical filtering, validation and False Discovery Rates assessment

freequant: no # label-free Quantification

labelquant: no # isobaric Labeling-Based Relative Quantification

bioquant: no # protein report based on Uniprot protein clusters

report: yes # multi-level reporting for both narrow-searches and open-searches

abacus: yes # combined analysis of LC-MS/MS results

tmtintegrator: no # integrates channel abundances from multiple TMT samples

database:

protein_database: /home/DATA2/trips/scamp/proteomes/2020-06-24-decoys-contam-mus_musculus_proteome.fa # path to the target-decoy protein database

decoytag: rev # prefix tag used added to decoy sequences

comet:

noindex: true # skip raw file indexing

param: # comet parameter file (default "comet.params.txt")

raw: mzML # format of the spectra file

msfragger: # v2.3

path: /home/DATA2/trips/scamp/MSFragger/MSFragger.jar # path to MSFragger jar

memory: 16 # how much memory in GB to use

param: # MSFragger parameter file

raw: mzML # spectra format

num_threads: 0 # 0=poll CPU to set num threads; else specify num threads directly (max 64)

precursor_mass_lower: -20 # lower bound of the precursor mass window

precursor_mass_upper: 20 # upper bound of the precursor mass window

precursor_mass_units: 1 # 0=Daltons, 1=ppm

precursor_true_tolerance: 20 # true precursor mass tolerance (window is +/- this value)

precursor_true_units: 1 # 0=Daltons, 1=ppm

fragment_mass_tolerance: 20 # fragment mass tolerance (window is +/- this value)

fragment_mass_units: 1 # fragment mass tolerance units (0 for Da, 1 for ppm)

calibrate_mass: 0 # 0=Off, 1=On, 2=On and find optimal parameters

deisotope: 0 # activates deisotoping.

isotope_error: -1/0/1/2/3 # 0=off, -1/0/1/2/3 (standard C13 error)

mass_offsets: 0 # allow for additional precursor mass window shifts. Multiplexed with isotope_error. mass_offsets = 0/79.966 can be use$

precursor_mass_mode: selected # selected or isolated

localize_delta_mass: 0 # this allows shifted fragment ions - fragment ions with mass increased by the calculated mass difference, to be includ$

delta_mass_exclude_ranges: (-1.5,3.5) # exclude mass range for shifted ions searching

fragment_ion_series: b,y # ion series used in search

search_enzyme_name: Trypsin # name of enzyme to be written to the pepXML file

search_enzyme_cutafter: KR # residues after which the enzyme cuts

search_enzyme_butnotafter: P # residues that the enzyme will not cut before

num_enzyme_termini: 2 # 2 for enzymatic, 1 for semi-enzymatic, 0 for nonspecific digestion

allowed_missed_cleavage: 2 # maximum value is 5

clip_nTerm_M: 1 # specifies the trimming of a protein N-terminal methionine as a variable modification (0 or 1)

variable_mod_01: 15.99490 M 3 # variable modification

variable_mod_02: 42.01060 [^ 1 # variable modification

variable_mod_03: 229.162932 n^ 1 # variable modification

variable_mod_04: 229.162932 S 1 # variable modification

variable_mod_05: # variable modification

variable_mod_06: # variable modification

variable_mod_07: # variable modification

allow_multiple_variable_mods_on_residue: 1 # static mods are not considered

max_variable_mods_per_peptide: 3 # maximum of 5

max_variable_mods_combinations: 5000 # maximum of 65534, limits number of modified peptides generated from sequence

output_file_extension: pepXML # file extension of output files

output_format: pepXML # file format of output files (pepXML or tsv)

output_report_topN: 3 # reports top N PSMs per input spectrum

output_max_expect: 50 # suppresses reporting of PSM if top hit has expectation greater than this threshold

report_alternative_proteins: 0 # 0=no, 1=yes

precursor_charge: 1 6 # assume range of potential precursor charge states. Only relevant when override_charge is set to 1

override_charge: 0 # 0=no, 1=yes to override existing precursor charge states with precursor_charge parameter

digest_min_length: 7 # minimum length of peptides to be generated during in-silico digestion

digest_max_length: 50 # maximum length of peptides to be generated during in-silico digestion

digest_mass_range: 500.0 5000.0 # mass range of peptides to be generated during in-silico digestion in Daltons

max_fragment_charge: 2 # maximum charge state for theoretical fragments to match (1-4)

track_zero_topN: 0 # in addition to topN results, keep track of top results in zero bin

zero_bin_accept_expect: 0 # boost top zero bin entry to top if it has expect under 0.01 - set to 0 to disable

zero_bin_mult_expect: 1 # disabled if above passes - multiply expect of zero bin for ordering purposes (does not affect reported expect)

add_topN_complementary: 0 # inserts complementary ions corresponding to the top N most intense fragments in each experimental spectra

minimum_peaks: 15 # required minimum number of peaks in spectrum to search (default 10)

use_topN_peaks: 150 # pre-process experimental spectrum to only use top N peaks

min_fragments_modelling: 3 # minimum number of matched peaks in PSM for inclusion in statistical modeling

min_matched_fragments: 4 # minimum number of matched peaks for PSM to be reported

minimum_ratio: 0.01 # filters out all peaks in experimental spectrum less intense than this multiple of the base peak intensity

clear_mz_range: 125.5 131.5 # for iTRAQ/TMT type data; will clear out all peaks in the specified m/z range

remove_precursor_peak: 0 # remove precursor peaks from tandem mass spectra. 0=not remove; 1=remove the peak with precursor charge; 2=remove the $

remove_precursor_range: -1.5,1.5 # m/z range in removing precursor peaks. Unit: Da.

intensity_transform: 0 # transform peaks intensities with sqrt root. 0=not transform; 1=transform using sqrt root.

add_Cterm_peptide: 0.000000 # c-term peptide fixed modifications

add_Cterm_protein: 0.000000 # c-term protein fixed modifications

add_Nterm_peptide: 0.000000 # n-term peptide fixed modifications

add_Nterm_protein: 0.000000 # n-term protein fixed modifications

add_A_alanine: 0.000000 # alanine fixed modifications

add_C_cysteine: 57.021464 # cysteine fixed modifications

add_D_aspartic_acid: 0.000000 # aspartic acid fixed modifications

add_E_glutamic_acid: 0.000000 # glutamic acid fixed modifications

add_F_phenylalanine: 0.000000 # phenylalanine fixed modifications

add_G_glycine: 0.000000 # glycine fixed modifications

add_H_histidine: 0.000000 # histidine fixed modifications

add_I_isoleucine: 0.000000 # isoleucine fixed modifications

add_K_lysine: 229.162932 # lysine fixed modifications

add_L_leucine: 0.000000 # leucine fixed modifications

add_M_methionine: 0.000000 # methionine fixed modifications

add_N_asparagine: 0.000000 # asparagine fixed modifications

add_P_proline: 0.000000 # proline fixed modifications

add_Q_glutamine: 0.000000 # glutamine fixed modifications

add_R_arginine: 0.000000 # arginine fixed modifications

add_S_serine: 0.000000 # serine fixed modifications

add_T_threonine: 0.000000 # threonine fixed modifications

add_V_valine: 0.000000 # valine fixed modifications

add_W_tryptophan: 0.000000 # tryptophan fixed modifications

add_Y_tyrosine: 0.000000 # tyrosine fixed modifications

peptideprophet: # v5.2

extension: pepXML # pepXML file extension

clevel: 0 # set Conservative Level in neg_stdev from the neg_mean, low numbers are less conservative, high numbers are more conse$

accmass: true # use Accurate Mass model binning

decoyprobs: true # compute possible non-zero probabilities for Decoy entries on the last iteration

enzyme: trypsin # enzyme used in sample (optional)

exclude: false # exclude deltaCn, Mascot, and Comet results from results (default Penalize results)

expectscore: true # use expectation value as the only contributor to the f-value for modeling

forcedistr: false # bypass quality control checks, report model despite bad modeling

glyc: false # enable peptide Glyco motif model

icat: false # apply ICAT model (default Autodetect ICAT)

instrwarn: false # warn and continue if combined data was generated by different instrument models

leave: false # leave alone deltaCn, Mascot, and Comet results from results (default Penalize results)

maldi: false # enable MALDI mode

masswidth: 5 # model mass width (default 5)

minpeplen: 7 # minimum peptide length not rejected (default 7)

minpintt: 2 # minimum number of NTT in a peptide used for positive pI model (default 2)

minpiprob: 0.9 # minimum probability after first pass of a peptide used for positive pI model (default 0.9)

minprob: 0.05 # report results with minimum probability (default 0.05)

minrtntt: 2 # minimum number of NTT in a peptide used for positive RT model (default 2)

minrtprob: 0.9 # minimum probability after first pass of a peptide used for positive RT model (default 0.9)

neggamma: false # use Gamma distribution to model the negative hits

noicat: false # do no apply ICAT model (default Autodetect ICAT)

nomass: false # disable mass model

nonmc: false # disable NMC missed cleavage model

nonparam: true # use semi-parametric modeling, must be used in conjunction with --decoy option

nontt: false # disable NTT enzymatic termini model

optimizefval: false # (SpectraST only) optimize f-value function f(dot,delta) using PCA

phospho: false # enable peptide Phospho motif model

pi: false # enable peptide pI model

ppm: true # use PPM mass error instead of Dalton for mass modeling

zero: false # report results with minimum probability 0

ptmprophet: # v5.2

autodirect: false # use direct evidence when the lability is high, use in combination with LABILITY

cions: # use specified C-term ions, separate multiple ions by commas (default: y for CID, z for ETD)

direct: false # use only direct evidence for evaluating PTM site probabilities

em: 2 # set EM models to 0 (no EM), 1 (Intensity EM Model Applied) or 2 (Intensity and Matched Peaks EM Models Applied)

static: false # use static fragppmtol for all PSMs instead of dynamically estimates offsets and tolerances

fragppmtol: 15 # when computing PSM-specific mass_offset and mass_tolerance, use specified default +/- MS2 mz tolerance on fragment io$

ifrags: false # use internal fragments for localization

keepold: false # retain old PTMProphet results in the pepXML file

lability: false # compute Lability of PTMs

massdiffmode: false # use the Mass Difference and localize

massoffset: 0 # adjust the massdiff by offset (0 = use default)

maxfragz: 0 # limit maximum fragment charge (default: 0=precursor charge, negative values subtract from precursor charge)

maxthreads: 4 # use specified number of threads for processing

mino: 0 # use specified number of pseudo-counts when computing Oscore (0 = use default)

minprob: 0 # use specified minimum probability to evaluate peptides

mods: # specify modifications

nions: # use specified N-term ions, separate multiple ions by commas (default: a,b for CID, c for ETD)

nominofactor: false # disable MINO factor correction when MINO= is set greater than 0 (default: apply MINO factor correction)

ppmtol: 1 # use specified +/- MS1 ppm tolerance on peptides which may have a slight offset depending on search parameters

verbose: false # produce Warnings to help troubleshoot potential PTM shuffling or mass difference issues

proteinprophet: # v5.2

accuracy: false # equivalent to --minprob 0

allpeps: false # consider all possible peptides in the database in the confidence model

confem: false # use the EM to compute probability given the confidence

delude: false # do NOT use peptide degeneracy information when assessing proteins

excludezeros: false # exclude zero prob entries

fpkm: false # model protein FPKM values

glyc: false # highlight peptide N-glycosylation motif

icat: false # highlight peptide cysteines

instances: false # use Expected Number of Ion Instances to adjust the peptide probabilities prior to NSP adjustment

iprophet: false # input is from iProphet

logprobs: false # use the log of the probabilities in the Confidence calculations

maxppmdiff: 20 # maximum peptide mass difference in PPM (default 20)

minprob: 0.05 # peptideProphet probabilty threshold (default 0.05)

mufactor: 1 # fudge factor to scale MU calculation (default 1)

nogroupwts: false # check peptide's Protein weight against the threshold (default: check peptide's Protein Group weight against threshold)

nonsp: false # do not use NSP model

nooccam: false # non-conservative maximum protein list

noprotlen: false # do not report protein length

normprotlen: false # normalize NSP using Protein Length

protmw: false # get protein mol weights

softoccam: false # peptide weights are apportioned equally among proteins within each Protein Group (less conservative protein count est$

unmapped: false # report results for UNMAPPED proteins

filter:

psmFDR: 0.01 # psm FDR level (default 0.01)

peptideFDR: 0.01 # peptide FDR level (default 0.01)

ionFDR: 0.01 # peptide ion FDR level (default 0.01)

proteinFDR: 0.01 # protein FDR level (default 0.01)

peptideProbability: 0.7 # top peptide probability threshold for the FDR filtering (default 0.7)

proteinProbability: 0.5 # protein probability threshold for the FDR filtering (not used with the razor algorithm) (default 0.5)

peptideWeight: 0.9 # threshold for defining peptide uniqueness (default 1)

razor: true # use razor peptides for protein FDR scoring

picked: true # apply the picked FDR algorithm before the protein scoring

mapMods: true # map modifications acquired by an open search

models: true # print model distribution

sequential: true # alternative algorithm that estimates FDR using both filtered PSM and Protein lists

freequant:

peakTimeWindow: 0.4 # specify the time windows for the peak (minute) (default 0.4)

retentionTimeWindow: 3 # specify the retention time window for xic (minute) (default 3)

tolerance: 10 # m/z tolerance in ppm (default 10)

isolated: true # use the isolated ion instead of the selected ion for quantification

labelquant:

annotation: annotation.txt # annotation file with custom names for the TMT channels

bestPSM: true # select the best PSMs for protein quantification

level: 2 # ms level for the quantification

minProb: 0.7 # only use PSMs with a minimum probability score

plex: 10 # number of channels

purity: 0.5 # ion purity threshold (default 0.5)

removeLow: 0.05 # ignore the lower 3% PSMs based on their summed abundances

tolerance: 20 # m/z tolerance in ppm (default 20)

uniqueOnly: false # report quantification based on only unique peptides

report:

msstats: false # create an output compatible to MSstats

withDecoys: false # add decoy observations to reports

mzID: false # create a mzID output

bioquant:

organismUniProtID: # UniProt proteome ID

level: 0.9 # cluster identity level (default 0.9)

abacus:

protein: true # global level protein report

peptide: false # global level peptide report

proteinProbability: 0.05 # minimum protein probability (default 0.9)

peptideProbability: 0.5 # minimum peptide probability (default 0.5)

uniqueOnly: false # report TMT quantification based on only unique peptides

reprint: false # create abacus reports using the Reprint format

tmtintegrator: # v1.1.2

path: # path to TMT-Integrator jar

memory: 100 # memory allocation, in Gb

output: # the location of output files

channel_num: 10 # number of channels in the multiplex (e.g. 10, 11)

ref_tag: pool # unique tag for identifying the reference channel (Bridge sample added to each multiplex)

groupby: -1 # level of data summarization(0: PSM aggregation to the gene level; 1: protein; 2: peptide sequence; 3: PTM site; -1: g$

psm_norm: false # perform additional retention time-based normalization at the PSM level

outlier_removal: true # perform outlier removal

prot_norm: -1 # normalization (0: None; 1: MD (median centering); 2: GN (median centering + variance scaling); -1: generate reports w$

min_pep_prob: 0.9 # minimum PSM probability threshold (in addition to FDR-based filtering by Philosopher)

min_purity: 0.5 # ion purity score threshold

min_percent: 0.05 # remove low intensity PSMs (e.g. value of 0.05 indicates removal of PSMs with the summed TMT reporter ions intensity i$

unique_pep: false # allow PSMs with unique peptides only (if true) or unique plus razor peptides (if false), as classified by Philosopher$

unique_gene: 0 # additional, gene-level uniqueness filter (0: allow all PSMs; 1: remove PSMs mapping to more than one GENE with eviden$

best_psm: true # keep the best PSM only (highest summed TMT intensity) among all redundant PSMs within the same LC-MS run

prot_exclude: sp|,tr| # exclude proteins with specified tags at the beginning of the accession number (e.g. none: no exclusion; sp|,tr| : exc$

allow_overlabel: false # allow PSMs with TMT on S (when overlabeling on S was allowed in the database search)

allow_unlabeled: false # allow PSMs without TMT tag or acetylation on the peptide n-terminus

mod_tag: none # PTM info for generation of PTM-specific reports (none: for Global data; S(79.9663),T(79.9663),Y(79.9663): for Phospho$

min_site_prob: -1 # site localization confidence threshold (-1: for Global; 0: as determined by the search engine; above 0 (e.g. 0.75): P$

ms1_int: true # use MS1 precursor ion intensity (if true) or MS2 summed TMT reporter ion intensity (if false) as part of the referenc$

top3_pep: true # use top 3 most intense peptide ions as part of the reference sample abundance estimation

print_RefInt: false # print individual reference sample abundance estimates for each multiplex in the final reports (in addition to the com$

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/philosopher/issues/140, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM62JKATCKDD3DACAOK3RYNSRVANCNFSM4OIPB5LQ.


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

anesvi commented 4 years ago

I looked at PXD019087 https://www.ebi.ac.uk/pride/archive/projects/PXD019087

This does not appear to be a TMT dataset at all.

We are happy to answer questions regarding our tools, but please make sure you understand the data prior to running the pipelines.

Regards, Alexey

From: Ciara Judge notifications@github.com Sent: Thursday, June 25, 2020 11:14 AM To: Nesvilab/philosopher philosopher@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [Nesvilab/philosopher] TMT Pipeline Error in Protein Prophet due to Peptide Prophet 'no data' (#140)

External Email - Use Caution

Hi, further to another issuehttps://github.com/Nesvilab/philosopher/issues/139 I was discussing which I now believe to be resolved, I am using the TMT pipeline to analyse databases that I am downloading from PRIDE and converting to mzML using MSConvert. After the operation of PeptideProphet (during which I get a number of warnings about failed mixture model quality tests), ProteinProphet fails, citing a suggestion that PeptideProphet did not run correctly or at all.

I am new to this type of analysis so I assume it is something I am doing wrong, any direction would be appreciated.

This is the complete report from the linux command line:

INFO[15:18:26] Executing Workspace v3.2.7

INFO[15:18:26] Creating workspace

WARN[15:18:26] A meta data folder was found and will not be overwritten.

INFO[15:18:26] Done

INFO[15:18:26] Executing Pipeline v3.2.7

INFO[15:18:26] Initiating the workspace on PXD019087_0

INFO[15:18:26] Creating workspace

INFO[15:18:26] Processing database

INFO[15:18:29] Running the Database Search on all data

MSFragger version MSFragger-3.0

Batmass-IO version 1.17.4

(c) University of Michigan

RawFileReader reading tool. Copyright (c) 2016 by Thermo Fisher Scientific, Inc. All rights reserved.

System OS: Linux, Architecture: amd64

Java Info: 1.8.0_201, Java HotSpot(TM) 64-Bit Server VM, Oracle Corporation

JVM started with 14 GB memory

Checking database...

Checking /home/DATA2/trips/scamp/PXD019087_0/TS_Miwi2-HA+RNAse_2.mzML...

Checking /home/DATA2/trips/scamp/PXD019087_0/TS_Miwi2-HA+RNAse_3.mzML...

Checking /home/DATA2/trips/scamp/PXD019087_0/TS_Miwi2-HA+RNase_1.mzML...

****MAIN SEARCH****

Checking database...

Parameters:

num_threads = 24

database_name = /home/DATA2/trips/scamp/proteomes/2020-06-24-decoys-contam-mus_musculus_proteome.fa

decoyprefix = rev

precursor_mass_lower = -20.0

precursor_mass_upper = 20.0

precursor_mass_units = 1

precursor_true_tolerance = 20.0

precursor_true_units = 1

fragment_mass_tolerance = 20.0

fragment_mass_units = 1

calibrate_mass = 0

write_calibrated_mgf = false

isotope_error = -1/0/1/2/3

mass_offsets = 0

labile_search_mode = OFF

precursor_mass_mode = SELECTED

localize_delta_mass = false

delta_mass_exclude_ranges = (-1.5, 3.5)

fragment_ion_series = b,y

diagnostic_intensity_filter = 0.0

Y_type_masses = 0/203.07937/406.15874/568.21156/730.26438/892.3172/349.137279

diagnostic_fragments = 204.086646/186.076086/168.065526/366.139466/144.0656/138.055/126.055/163.060096/512.197375/292.1026925/274.0921325/657.2349/243.026426/405.079246/485.045576/308.09761

search_enzyme_name = Trypsin

search_enzyme_cutafter = KR

search_enzyme_butnotafter = P

num_enzyme_termini = 2

allowed_missed_cleavage = 2

clip_nTerm_M = true

allow_multiple_variable_mods_on_residue = true

max_variable_mods_per_peptide = 3

max_variable_mods_combinations = 5000

output_file_extension = pepXML

output_format = pepXML

output_report_topN = 3

output_max_expect = 50.0

report_alternative_proteins = false

override_charge = false

precursor_charge_low = 1

precursor_charge_high = 6

digest_min_length = 7

digest_max_length = 50

digest_mass_range_low = 500.0

digest_mass_range_high = 5000.0

max_fragment_charge = 2

deisotope = 0

track_zero_topN = 0

zero_bin_accept_expect = 0.0

zero_bin_mult_expect = 1.0

add_topN_complementary = 0

minimum_peaks = 15

use_topN_peaks = 150

minIonsScoring = 3

min_matched_fragments = 4

minimum_ratio = 0.01

intensity_transform = 0

remove_precursor_peak = 0

remove_precursor_range = -1.5,1.5

clear_mz_range_low = 125.5

clear_mz_range_high = 131.5

excluded_scan_list_file =

mass_diff_to_variable_mod = 0

variable_mod_01 = 15.99490 M 3

variable_mod_02 = 42.01060 [^ 1

variable_mod_03 = 229.162932 n^ 1

variable_mod_04 = 229.162932 S 1

add_A_alanine = 0.000000

add_C_cysteine = 57.021464

add_Cterm_peptide = 0.0

add_Cterm_protein = 0.0

add_D_aspartic_acid = 0.000000

add_E_glutamic_acid = 0.000000

add_F_phenylalanine = 0.000000

add_G_glycine = 0.000000

add_H_histidine = 0.000000

add_I_isoleucine = 0.000000

add_K_lysine = 229.162932

add_L_leucine = 0.000000

add_M_methionine = 0.000000

add_N_asparagine = 0.000000

add_Nterm_peptide = 0.0

add_Nterm_protein = 0.0

add_P_proline = 0.000000

add_Q_glutamine = 0.000000

add_R_arginine = 0.000000

add_S_serine = 0.000000

add_T_threonine = 0.000000

add_V_valine = 0.000000

add_W_tryptophan = 0.000000

add_Y_tyrosine = 0.000000

Selected fragment tolerance 0.10 Da.

4658294454 fragments to be searched in 8 slices (69.41 GB total)

Operating on slice 1 of 8:

    Fragment index slice generated in 42.52 s

    001. TS_Miwi2-HA+RNAse_2.mzML 43.0 s

           [progress: 116032/116032 (100%) - 3357 spectra/s] 34.6s

    002. TS_Miwi2-HA+RNAse_3.mzML 27.0 s

           [progress: 64463/64463 (100%) - 4226 spectra/s] 15.3s

    003. TS_Miwi2-HA+RNase_1.mzML 27.0 s

           [progress: 58007/58007 (100%) - 5810 spectra/s] 10.0s

Operating on slice 2 of 8:

    Fragment index slice generated in 12.36 s

    001. TS_Miwi2-HA+RNAse_2.mzML 38.6 s

           [progress: 116032/116032 (100%) - 4530 spectra/s] 25.6s

    002. TS_Miwi2-HA+RNAse_3.mzML 26.8 s

           [progress: 64463/64463 (100%) - 8171 spectra/s] 7.9s

    003. TS_Miwi2-HA+RNase_1.mzML 26.1 s

           [progress: 58007/58007 (100%) - 7921 spectra/s] 7.3s

Operating on slice 3 of 8:

    Fragment index slice generated in 11.35 s

    001. TS_Miwi2-HA+RNAse_2.mzML 36.0 s

           [progress: 116032/116032 (100%) - 5307 spectra/s] 21.9s

    002. TS_Miwi2-HA+RNAse_3.mzML 26.7 s

           [progress: 64463/64463 (100%) - 8196 spectra/s] 7.9s

    003. TS_Miwi2-HA+RNase_1.mzML 26.9 s

           [progress: 58007/58007 (100%) - 8419 spectra/s] 6.9s

Operating on slice 4 of 8:

    Fragment index slice generated in 13.53 s

    001. TS_Miwi2-HA+RNAse_2.mzML 37.4 s

           [progress: 116032/116032 (100%) - 5243 spectra/s] 22.1s

    002. TS_Miwi2-HA+RNAse_3.mzML 26.7 s

           [progress: 64463/64463 (100%) - 8522 spectra/s] 7.6s

    003. TS_Miwi2-HA+RNase_1.mzML 26.8 s

           [progress: 58007/58007 (100%) - 9244 spectra/s] 6.3s

Operating on slice 5 of 8:

    Fragment index slice generated in 13.58 s

    001. TS_Miwi2-HA+RNAse_2.mzML 38.4 s

           [progress: 116032/116032 (100%) - 5067 spectra/s] 22.9s

    002. TS_Miwi2-HA+RNAse_3.mzML 26.3 s

           [progress: 64463/64463 (100%) - 9055 spectra/s] 7.1s

    003. TS_Miwi2-HA+RNase_1.mzML 26.8 s

           [progress: 58007/58007 (100%) - 10093 spectra/s] 5.7s

Operating on slice 6 of 8:

    Fragment index slice generated in 12.96 s

    001. TS_Miwi2-HA+RNAse_2.mzML 38.9 s

           [progress: 116032/116032 (100%) - 5787 spectra/s] 20.1s

    002. TS_Miwi2-HA+RNAse_3.mzML 23.1 s

           [progress: 64463/64463 (100%) - 17479 spectra/s] 3.7s

    003. TS_Miwi2-HA+RNase_1.mzML 24.4 s

           [progress: 58007/58007 (100%) - 8882 spectra/s] 6.5s

Operating on slice 7 of 8:

    Fragment index slice generated in 12.60 s

    001. TS_Miwi2-HA+RNAse_2.mzML 39.2 s

           [progress: 116032/116032 (100%) - 5469 spectra/s] 21.2s

    002. TS_Miwi2-HA+RNAse_3.mzML 25.9 s

           [progress: 64463/64463 (100%) - 9953 spectra/s] 6.5s

    003. TS_Miwi2-HA+RNase_1.mzML 26.1 s

           [progress: 58007/58007 (100%) - 8778 spectra/s] 6.6s

Operating on slice 8 of 8:

    Fragment index slice generated in 13.02 s

    001. TS_Miwi2-HA+RNAse_2.mzML 38.5 s

           [progress: 116032/116032 (100%) - 5635 spectra/s] 20.6s | postprocessing 52.1 s

    002. TS_Miwi2-HA+RNAse_3.mzML 27.0 s

           [progress: 64463/64463 (100%) - 9607 spectra/s] 6.7s | postprocessing 26.3 s

    003. TS_Miwi2-HA+RNase_1.mzML 26.0 s

           [progress: 58007/58007 (100%) - 10636 spectra/s] 5.5s | postprocessing 27.1 s

MAIN SEARCH DONE IN 21.525 MIN

***TOTAL TIME 21.688 MIN****

INFO[15:40:15] Running the validation and inference on PXD019087_0

INFO[15:40:15] Executing PeptideProphet on PXD019087_0

file 1: /home/DATA2/trips/scamp/PXD019087_0/TS_Miwi2-HA+RNAse_2.pepXML

file 2: /home/DATA2/trips/scamp/PXD019087_0/TS_Miwi2-HA+RNAse_3.pepXML

file 3: /home/DATA2/trips/scamp/PXD019087_0/TS_Miwi2-HA+RNase_1.pepXML

processed altogether 91871 results

INFO: Results written to file: /home/DATA2/trips/scamp/PXD019087_0/interact.pep.xml

using Accurate Mass Bins

using PPM mass difference

Using Decoy Label "rev_".

Decoy Probabilities will be reported.

Using non-parametric distributions

(X! Tandem) (using Tandem's expectation score for modeling)

adding ACCMASS mixture distribution

using search_offsets in ACCMASS mixture distr: 0

init with X! Tandem trypsin

MS Instrument info: Manufacturer: UNKNOWN, Model: UNKNOWN, Ionization: UNKNOWN, Analyzer: UNKNOWN, Detector: UNKNOWN

INFO: Processing standard MixtureModel ...

PeptideProphet (TPP v5.2.1-dev Flammagenitus, Build 201906251008-exported (Linux-x86_64)) AKeller@ISB

read in 0 1+, 32020 2+, 50838 3+, 7804 4+, 966 5+, 230 6+, and 13 7+ spectra.

Initialising statistical models ...

Found 42793 Decoys, and 49078 Non-Decoys

Iterations: .........10.........20.........30.

WARNING: Mixture model quality test failed for charge (1+).

WARNING: Mixture model quality test failed for charge (2+).

WARNING: Mixture model quality test failed for charge (3+).

WARNING: Mixture model quality test failed for charge (4+).

WARNING: Mixture model quality test failed for charge (5+).

WARNING: Mixture model quality test failed for charge (6+).

WARNING: Mixture model quality test failed for charge (7+).

model complete after 32 iterations

INFO[15:44:41] Creating combined protein inference

ProteinProphet (C++) by Insilicos LLC and LabKey Software, after the original Perl by A. Keller (TPP v5.2.1-dev Flammagenitus, Build 201906251008-exported (Linux-x86_64))

(no FPKM) (using degen pep info)

Reading in /home/DATA2/trips/scamp/PXD019087_0/interact.pep.xml...

did not find any PeptideProphet results in input data! Did you forget to run PeptideProphet?

...read in 0 1+, 0 2+, 0 3+, 0 4+, 0 5+, 0 6+, 0 7+ spectra with min prob 0.05

WARNING: no data - output file will be empty

FATA[15:44:41] Cannot execute program. There was an error with ProteinProphet, please check your parameters and input files

This is my parameter file:

Philosopher pipeline configuration file.

#

The pipeline mode automates the processing done by Philosopher. First, check

the steps you want to execute in the commands section and change them to

'yes'. For each selected command, go to its section and adjust the parameters

accordingly to your analysis.

#

If you want to include MSFragger and TMT-Integrator into your analysis, you will

haver o download them separately and then add their location tot their configuration

#

Usage:

philosopher pipeline --config [list_of_data_set_folders]

analytics: true # reports when a workspace is created for usage estimation (default true)

slackToken: # specify the Slack API token

slackChannel: # specify the channel name

commands:

workspace: yes # manage the experiment workspace for the analysis

database: yes # target-decoy database formatting

comet: no # peptide spectrum matching with Comet

msfragger: yes # peptide spectrum matching with MSFragger

peptideprophet: yes # peptide assignment validation

ptmprophet: no # PTM site localization

proteinprophet: no # protein identification validation

filter: yes # statistical filtering, validation and False Discovery Rates assessment

freequant: no # label-free Quantification

labelquant: no # isobaric Labeling-Based Relative Quantification

bioquant: no # protein report based on Uniprot protein clusters

report: yes # multi-level reporting for both narrow-searches and open-searches

abacus: yes # combined analysis of LC-MS/MS results

tmtintegrator: no # integrates channel abundances from multiple TMT samples

database:

protein_database: /home/DATA2/trips/scamp/proteomes/2020-06-24-decoys-contam-mus_musculus_proteome.fa # path to the target-decoy protein database

decoytag: rev # prefix tag used added to decoy sequences

comet:

noindex: true # skip raw file indexing

param: # comet parameter file (default "comet.params.txt")

raw: mzML # format of the spectra file

msfragger: # v2.3

path: /home/DATA2/trips/scamp/MSFragger/MSFragger.jar # path to MSFragger jar

memory: 16 # how much memory in GB to use

param: # MSFragger parameter file

raw: mzML # spectra format

num_threads: 0 # 0=poll CPU to set num threads; else specify num threads directly (max 64)

precursor_mass_lower: -20 # lower bound of the precursor mass window

precursor_mass_upper: 20 # upper bound of the precursor mass window

precursor_mass_units: 1 # 0=Daltons, 1=ppm

precursor_true_tolerance: 20 # true precursor mass tolerance (window is +/- this value)

precursor_true_units: 1 # 0=Daltons, 1=ppm

fragment_mass_tolerance: 20 # fragment mass tolerance (window is +/- this value)

fragment_mass_units: 1 # fragment mass tolerance units (0 for Da, 1 for ppm)

calibrate_mass: 0 # 0=Off, 1=On, 2=On and find optimal parameters

deisotope: 0 # activates deisotoping.

isotope_error: -1/0/1/2/3 # 0=off, -1/0/1/2/3 (standard C13 error)

mass_offsets: 0 # allow for additional precursor mass window shifts. Multiplexed with isotope_error. mass_offsets = 0/79.966 can be use$

precursor_mass_mode: selected # selected or isolated

localize_delta_mass: 0 # this allows shifted fragment ions - fragment ions with mass increased by the calculated mass difference, to be includ$

delta_mass_exclude_ranges: (-1.5,3.5) # exclude mass range for shifted ions searching

fragment_ion_series: b,y # ion series used in search

search_enzyme_name: Trypsin # name of enzyme to be written to the pepXML file

search_enzyme_cutafter: KR # residues after which the enzyme cuts

search_enzyme_butnotafter: P # residues that the enzyme will not cut before

num_enzyme_termini: 2 # 2 for enzymatic, 1 for semi-enzymatic, 0 for nonspecific digestion

allowed_missed_cleavage: 2 # maximum value is 5

clip_nTerm_M: 1 # specifies the trimming of a protein N-terminal methionine as a variable modification (0 or 1)

variable_mod_01: 15.99490 M 3 # variable modification

variable_mod_02: 42.01060 [^ 1 # variable modification

variable_mod_03: 229.162932 n^ 1 # variable modification

variable_mod_04: 229.162932 S 1 # variable modification

variable_mod_05: # variable modification

variable_mod_06: # variable modification

variable_mod_07: # variable modification

allow_multiple_variable_mods_on_residue: 1 # static mods are not considered

max_variable_mods_per_peptide: 3 # maximum of 5

max_variable_mods_combinations: 5000 # maximum of 65534, limits number of modified peptides generated from sequence

output_file_extension: pepXML # file extension of output files

output_format: pepXML # file format of output files (pepXML or tsv)

output_report_topN: 3 # reports top N PSMs per input spectrum

output_max_expect: 50 # suppresses reporting of PSM if top hit has expectation greater than this threshold

report_alternative_proteins: 0 # 0=no, 1=yes

precursor_charge: 1 6 # assume range of potential precursor charge states. Only relevant when override_charge is set to 1

override_charge: 0 # 0=no, 1=yes to override existing precursor charge states with precursor_charge parameter

digest_min_length: 7 # minimum length of peptides to be generated during in-silico digestion

digest_max_length: 50 # maximum length of peptides to be generated during in-silico digestion

digest_mass_range: 500.0 5000.0 # mass range of peptides to be generated during in-silico digestion in Daltons

max_fragment_charge: 2 # maximum charge state for theoretical fragments to match (1-4)

track_zero_topN: 0 # in addition to topN results, keep track of top results in zero bin

zero_bin_accept_expect: 0 # boost top zero bin entry to top if it has expect under 0.01 - set to 0 to disable

zero_bin_mult_expect: 1 # disabled if above passes - multiply expect of zero bin for ordering purposes (does not affect reported expect)

add_topN_complementary: 0 # inserts complementary ions corresponding to the top N most intense fragments in each experimental spectra

minimum_peaks: 15 # required minimum number of peaks in spectrum to search (default 10)

use_topN_peaks: 150 # pre-process experimental spectrum to only use top N peaks

min_fragments_modelling: 3 # minimum number of matched peaks in PSM for inclusion in statistical modeling

min_matched_fragments: 4 # minimum number of matched peaks for PSM to be reported

minimum_ratio: 0.01 # filters out all peaks in experimental spectrum less intense than this multiple of the base peak intensity

clear_mz_range: 125.5 131.5 # for iTRAQ/TMT type data; will clear out all peaks in the specified m/z range

remove_precursor_peak: 0 # remove precursor peaks from tandem mass spectra. 0=not remove; 1=remove the peak with precursor charge; 2=remove the $

remove_precursor_range: -1.5,1.5 # m/z range in removing precursor peaks. Unit: Da.

intensity_transform: 0 # transform peaks intensities with sqrt root. 0=not transform; 1=transform using sqrt root.

add_Cterm_peptide: 0.000000 # c-term peptide fixed modifications

add_Cterm_protein: 0.000000 # c-term protein fixed modifications

add_Nterm_peptide: 0.000000 # n-term peptide fixed modifications

add_Nterm_protein: 0.000000 # n-term protein fixed modifications

add_A_alanine: 0.000000 # alanine fixed modifications

add_C_cysteine: 57.021464 # cysteine fixed modifications

add_D_aspartic_acid: 0.000000 # aspartic acid fixed modifications

add_E_glutamic_acid: 0.000000 # glutamic acid fixed modifications

add_F_phenylalanine: 0.000000 # phenylalanine fixed modifications

add_G_glycine: 0.000000 # glycine fixed modifications

add_H_histidine: 0.000000 # histidine fixed modifications

add_I_isoleucine: 0.000000 # isoleucine fixed modifications

add_K_lysine: 229.162932 # lysine fixed modifications

add_L_leucine: 0.000000 # leucine fixed modifications

add_M_methionine: 0.000000 # methionine fixed modifications

add_N_asparagine: 0.000000 # asparagine fixed modifications

add_P_proline: 0.000000 # proline fixed modifications

add_Q_glutamine: 0.000000 # glutamine fixed modifications

add_R_arginine: 0.000000 # arginine fixed modifications

add_S_serine: 0.000000 # serine fixed modifications

add_T_threonine: 0.000000 # threonine fixed modifications

add_V_valine: 0.000000 # valine fixed modifications

add_W_tryptophan: 0.000000 # tryptophan fixed modifications

add_Y_tyrosine: 0.000000 # tyrosine fixed modifications

peptideprophet: # v5.2

extension: pepXML # pepXML file extension

clevel: 0 # set Conservative Level in neg_stdev from the neg_mean, low numbers are less conservative, high numbers are more conse$

accmass: true # use Accurate Mass model binning

decoyprobs: true # compute possible non-zero probabilities for Decoy entries on the last iteration

enzyme: trypsin # enzyme used in sample (optional)

exclude: false # exclude deltaCn, Mascot, and Comet results from results (default Penalize results)

expectscore: true # use expectation value as the only contributor to the f-value for modeling

forcedistr: false # bypass quality control checks, report model despite bad modeling

glyc: false # enable peptide Glyco motif model

icat: false # apply ICAT model (default Autodetect ICAT)

instrwarn: false # warn and continue if combined data was generated by different instrument models

leave: false # leave alone deltaCn, Mascot, and Comet results from results (default Penalize results)

maldi: false # enable MALDI mode

masswidth: 5 # model mass width (default 5)

minpeplen: 7 # minimum peptide length not rejected (default 7)

minpintt: 2 # minimum number of NTT in a peptide used for positive pI model (default 2)

minpiprob: 0.9 # minimum probability after first pass of a peptide used for positive pI model (default 0.9)

minprob: 0.05 # report results with minimum probability (default 0.05)

minrtntt: 2 # minimum number of NTT in a peptide used for positive RT model (default 2)

minrtprob: 0.9 # minimum probability after first pass of a peptide used for positive RT model (default 0.9)

neggamma: false # use Gamma distribution to model the negative hits

noicat: false # do no apply ICAT model (default Autodetect ICAT)

nomass: false # disable mass model

nonmc: false # disable NMC missed cleavage model

nonparam: true # use semi-parametric modeling, must be used in conjunction with --decoy option

nontt: false # disable NTT enzymatic termini model

optimizefval: false # (SpectraST only) optimize f-value function f(dot,delta) using PCA

phospho: false # enable peptide Phospho motif model

pi: false # enable peptide pI model

ppm: true # use PPM mass error instead of Dalton for mass modeling

zero: false # report results with minimum probability 0

ptmprophet: # v5.2

autodirect: false # use direct evidence when the lability is high, use in combination with LABILITY

cions: # use specified C-term ions, separate multiple ions by commas (default: y for CID, z for ETD)

direct: false # use only direct evidence for evaluating PTM site probabilities

em: 2 # set EM models to 0 (no EM), 1 (Intensity EM Model Applied) or 2 (Intensity and Matched Peaks EM Models Applied)

static: false # use static fragppmtol for all PSMs instead of dynamically estimates offsets and tolerances

fragppmtol: 15 # when computing PSM-specific mass_offset and mass_tolerance, use specified default +/- MS2 mz tolerance on fragment io$

ifrags: false # use internal fragments for localization

keepold: false # retain old PTMProphet results in the pepXML file

lability: false # compute Lability of PTMs

massdiffmode: false # use the Mass Difference and localize

massoffset: 0 # adjust the massdiff by offset (0 = use default)

maxfragz: 0 # limit maximum fragment charge (default: 0=precursor charge, negative values subtract from precursor charge)

maxthreads: 4 # use specified number of threads for processing

mino: 0 # use specified number of pseudo-counts when computing Oscore (0 = use default)

minprob: 0 # use specified minimum probability to evaluate peptides

mods: # specify modifications

nions: # use specified N-term ions, separate multiple ions by commas (default: a,b for CID, c for ETD)

nominofactor: false # disable MINO factor correction when MINO= is set greater than 0 (default: apply MINO factor correction)

ppmtol: 1 # use specified +/- MS1 ppm tolerance on peptides which may have a slight offset depending on search parameters

verbose: false # produce Warnings to help troubleshoot potential PTM shuffling or mass difference issues

proteinprophet: # v5.2

accuracy: false # equivalent to --minprob 0

allpeps: false # consider all possible peptides in the database in the confidence model

confem: false # use the EM to compute probability given the confidence

delude: false # do NOT use peptide degeneracy information when assessing proteins

excludezeros: false # exclude zero prob entries

fpkm: false # model protein FPKM values

glyc: false # highlight peptide N-glycosylation motif

icat: false # highlight peptide cysteines

instances: false # use Expected Number of Ion Instances to adjust the peptide probabilities prior to NSP adjustment

iprophet: false # input is from iProphet

logprobs: false # use the log of the probabilities in the Confidence calculations

maxppmdiff: 20 # maximum peptide mass difference in PPM (default 20)

minprob: 0.05 # peptideProphet probabilty threshold (default 0.05)

mufactor: 1 # fudge factor to scale MU calculation (default 1)

nogroupwts: false # check peptide's Protein weight against the threshold (default: check peptide's Protein Group weight against threshold)

nonsp: false # do not use NSP model

nooccam: false # non-conservative maximum protein list

noprotlen: false # do not report protein length

normprotlen: false # normalize NSP using Protein Length

protmw: false # get protein mol weights

softoccam: false # peptide weights are apportioned equally among proteins within each Protein Group (less conservative protein count est$

unmapped: false # report results for UNMAPPED proteins

filter:

psmFDR: 0.01 # psm FDR level (default 0.01)

peptideFDR: 0.01 # peptide FDR level (default 0.01)

ionFDR: 0.01 # peptide ion FDR level (default 0.01)

proteinFDR: 0.01 # protein FDR level (default 0.01)

peptideProbability: 0.7 # top peptide probability threshold for the FDR filtering (default 0.7)

proteinProbability: 0.5 # protein probability threshold for the FDR filtering (not used with the razor algorithm) (default 0.5)

peptideWeight: 0.9 # threshold for defining peptide uniqueness (default 1)

razor: true # use razor peptides for protein FDR scoring

picked: true # apply the picked FDR algorithm before the protein scoring

mapMods: true # map modifications acquired by an open search

models: true # print model distribution

sequential: true # alternative algorithm that estimates FDR using both filtered PSM and Protein lists

freequant:

peakTimeWindow: 0.4 # specify the time windows for the peak (minute) (default 0.4)

retentionTimeWindow: 3 # specify the retention time window for xic (minute) (default 3)

tolerance: 10 # m/z tolerance in ppm (default 10)

isolated: true # use the isolated ion instead of the selected ion for quantification

labelquant:

annotation: annotation.txt # annotation file with custom names for the TMT channels

bestPSM: true # select the best PSMs for protein quantification

level: 2 # ms level for the quantification

minProb: 0.7 # only use PSMs with a minimum probability score

plex: 10 # number of channels

purity: 0.5 # ion purity threshold (default 0.5)

removeLow: 0.05 # ignore the lower 3% PSMs based on their summed abundances

tolerance: 20 # m/z tolerance in ppm (default 20)

uniqueOnly: false # report quantification based on only unique peptides

report:

msstats: false # create an output compatible to MSstats

withDecoys: false # add decoy observations to reports

mzID: false # create a mzID output

bioquant:

organismUniProtID: # UniProt proteome ID

level: 0.9 # cluster identity level (default 0.9)

abacus:

protein: true # global level protein report

peptide: false # global level peptide report

proteinProbability: 0.05 # minimum protein probability (default 0.9)

peptideProbability: 0.5 # minimum peptide probability (default 0.5)

uniqueOnly: false # report TMT quantification based on only unique peptides

reprint: false # create abacus reports using the Reprint format

tmtintegrator: # v1.1.2

path: # path to TMT-Integrator jar

memory: 100 # memory allocation, in Gb

output: # the location of output files

channel_num: 10 # number of channels in the multiplex (e.g. 10, 11)

ref_tag: pool # unique tag for identifying the reference channel (Bridge sample added to each multiplex)

groupby: -1 # level of data summarization(0: PSM aggregation to the gene level; 1: protein; 2: peptide sequence; 3: PTM site; -1: g$

psm_norm: false # perform additional retention time-based normalization at the PSM level

outlier_removal: true # perform outlier removal

prot_norm: -1 # normalization (0: None; 1: MD (median centering); 2: GN (median centering + variance scaling); -1: generate reports w$

min_pep_prob: 0.9 # minimum PSM probability threshold (in addition to FDR-based filtering by Philosopher)

min_purity: 0.5 # ion purity score threshold

min_percent: 0.05 # remove low intensity PSMs (e.g. value of 0.05 indicates removal of PSMs with the summed TMT reporter ions intensity i$

unique_pep: false # allow PSMs with unique peptides only (if true) or unique plus razor peptides (if false), as classified by Philosopher$

unique_gene: 0 # additional, gene-level uniqueness filter (0: allow all PSMs; 1: remove PSMs mapping to more than one GENE with eviden$

best_psm: true # keep the best PSM only (highest summed TMT intensity) among all redundant PSMs within the same LC-MS run

prot_exclude: sp|,tr| # exclude proteins with specified tags at the beginning of the accession number (e.g. none: no exclusion; sp|,tr| : exc$

allow_overlabel: false # allow PSMs with TMT on S (when overlabeling on S was allowed in the database search)

allow_unlabeled: false # allow PSMs without TMT tag or acetylation on the peptide n-terminus

mod_tag: none # PTM info for generation of PTM-specific reports (none: for Global data; S(79.9663),T(79.9663),Y(79.9663): for Phospho$

min_site_prob: -1 # site localization confidence threshold (-1: for Global; 0: as determined by the search engine; above 0 (e.g. 0.75): P$

ms1_int: true # use MS1 precursor ion intensity (if true) or MS2 summed TMT reporter ion intensity (if false) as part of the referenc$

top3_pep: true # use top 3 most intense peptide ions as part of the reference sample abundance estimation

print_RefInt: false # print individual reference sample abundance estimates for each multiplex in the final reports (in addition to the com$

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/philosopher/issues/140, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM62JKATCKDD3DACAOK3RYNSRVANCNFSM4OIPB5LQ.


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues