Nesvilab / philosopher

PeptideProphet, PTMProphet, ProteinProphet, iProphet, Abacus, and FDR filtering
https://philosopher.nesvilab.org
GNU General Public License v3.0
109 stars 17 forks source link

Philosopher pipeline equivalent of FragPipe steps #153

Closed lydiayliu closed 3 years ago

lydiayliu commented 4 years ago

Describe the bug I am following the instructions on "Running a FragPipe-equivalent workflow on Linux", here: https://msfragger.nesvilab.org/tutorial_linux.html

I've run my analyses on the exact same inputs twice (CrystalC results), once following the script for open search, and once attempting to convert the given parameters into the Philosopher pipeline yaml.

I'm confused as to what the default is for some of the parameters that are not specified, and why some of the names of parameters don't match up with options in the yaml.

Can you please have a look at both the script and the yaml and let me know if this is an accurate translation?

I also have a specific question, the results of my comparison already differs at the peptideprophet step. In the yaml, ppm and accmass are set to true by defaut, but it seems like they must be added as parameters on the command line. Should these two parameters be added for open search analysis? There's not that much description on the github manual page.

script:

# Run PeptideProphet, ProteinProphet, and FDR filtering with Philosopher
$philosopherPath workspace --clean
$philosopherPath workspace --init
$philosopherPath database --annotate $fastaPath --prefix $decoyPrefix

$philosopherPath peptideprophet --nonparam --expectscore --decoyprobs --masswidth 1000.0 --clevel -2 --decoy $decoyPrefix --combine --database $fastaPath --output combined ./*_c.pepXML # Open search

$philosopherPath proteinprophet --maxppmdiff 2000000 --output combined combined.pep.xml

$philosopherPath filter --sequential --razor --mapmods --tag $decoyPrefix --pepxml ./combined.pep.xml --protxml ./combined.prot.xml # Open search

# Perform quantification.
$philosopherPath freequant --tol 30 --dir $dataDirPath

# Make reports.
$philosopherPath report
$philosopherPath workspace --clean

(there was something wrong with the mass calibration for one of my batches of samples so mass tolerance is set to 30ppm)

yaml: (I've removed the steps that are not needed)

analytics: true                                # reports when a workspace is created for usage statistics

commands:
  workspace: no                               # manage the experiment workspace for the analysis
  database: no                                # target-decoy database formatting
  comet: no                                    # peptide spectrum matching with Comet
  msfragger: no                                # peptide spectrum matching with MSFragger
  peptideprophet: no                           # peptide assignment validation
  ptmprophet: no                               # PTM site localization
  proteinprophet: no                           # protein identification validation
  filter: yes                                   # statistical filtering, validation and False Discovery Rates assessment
  freequant: yes                                # label-free Quantification
  labelquant: no                               # isobaric Labeling-Based Relative Quantification
  bioquant: no                                 # protein report based on Uniprot protein clusters
  report: yes                                   # multi-level reporting for both narrow-searches and open-searches
  abacus: no                                   # combined analysis of LC-MS/MS results
  tmtintegrator: no                            # integrates channel abundances from multiple TMT samples

database:
  protein_database: /reference/reference.fasta                           # path to the target-decoy protein database
  decoy_tag: rev_                              # prefix tag used added to decoy sequences

peptideprophet:                                # v5.2
  concurrent: yes                            # Concurrent execution of multiple instaces
  extension: pepXML                            # pepXML file extension
  clevel: -2                                    # set Conservative Level in neg_stdev from the neg_mean, low numbers are less conservative, high numbers are more conservative
  accmass: true                                # use Accurate Mass model binning
  decoyprobs: true                             # compute possible non-zero probabilities for Decoy entries on the last iteration
  enzyme: trypsin                              # enzyme used in sample (optional)
  exclude: false                               # exclude deltaCn*, Mascot*, and Comet* results from results (default Penalize * results)
  expectscore: true                            # use expectation value as the only contributor to the f-value for modeling
  forcedistr: false                            # bypass quality control checks, report model despite bad modeling
  glyc: false                                  # enable peptide Glyco motif model
  icat: false                                  # apply ICAT model (default Autodetect ICAT)
  instrwarn: false                             # warn and continue if combined data was generated by different instrument models
  leave: false                                 # leave alone deltaCn*, Mascot*, and Comet* results from results (default Penalize * results)
  maldi: false                                 # enable MALDI mode
  masswidth: 1000.0                                 # model mass width (default 5)
  minpeplen: 7                                 # minimum peptide length not rejected (default 7)
  minpintt: 2                                  # minimum number of NTT in a peptide used for positive pI model (default 2)
  minpiprob: 0.9                               # minimum probability after first pass of a peptide used for positive pI model (default 0.9)
  minprob: 0.05                                # report results with minimum probability (default 0.05)
  minrtntt: 2                                  # minimum number of NTT in a peptide used for positive RT model (default 2)
  minrtprob: 0.9                               # minimum probability after first pass of a peptide used for positive RT model (default 0.9)
  neggamma: false                              # use Gamma distribution to model the negative hits
  noicat: false                                # do no apply ICAT model (default Autodetect ICAT)
  nomass: false                                # disable mass model
  nonmc: false                                 # disable NMC missed cleavage model
  nonparam: true                               # use semi-parametric modeling, must be used in conjunction with --decoy option
  nontt: false                                 # disable NTT enzymatic termini model
  optimizefval: false                          # (SpectraST only) optimize f-value function f(dot,delta) using PCA
  phospho: false                               # enable peptide Phospho motif model
  pi: false                                    # enable peptide pI model
  ppm: true                                    # use PPM mass error instead of Dalton for mass modeling
  zero: false                                  # report results with minimum probability 0

proteinprophet:                                # v5.2
  accuracy: false                              # equivalent to --minprob 0
  allpeps: false                               # consider all possible peptides in the database in the confidence model
  confem: false                                # use the EM to compute probability given the confidence
  delude: false                                # do NOT use peptide degeneracy information when assessing proteins
  excludezeros: false                          # exclude zero prob entries
  fpkm: false                                  # model protein FPKM values
  glyc: false                                  # highlight peptide N-glycosylation motif
  icat: false                                  # highlight peptide cysteines
  instances: false                             # use Expected Number of Ion Instances to adjust the peptide probabilities prior to NSP adjustment
  iprophet: false                              # input is from iProphet
  logprobs: false                              # use the log of the probabilities in the Confidence calculations
  maxppmdiff: 2000000                               # maximum peptide mass difference in PPM (default 20)
  minprob: 0.05                                # peptideProphet probabilty threshold (default 0.05)
  mufactor: 1                                  # fudge factor to scale MU calculation (default 1)
  nogroupwts: false                            # check peptide's Protein weight against the threshold (default: check peptide's Protein Group weight against threshold)
  nonsp: false                                 # do not use NSP model
  nooccam: false                               # non-conservative maximum protein list
  noprotlen: false                             # do not report protein length
  normprotlen: false                           # normalize NSP using Protein Length
  protmw: false                                # get protein mol weights
  softoccam: false                             # peptide weights are apportioned equally among proteins within each Protein Group (less conservative protein count estimate)
  unmapped: false                              # report results for UNMAPPED proteins

filter:
  psmFDR: 0.01                                 # psm FDR level (default 0.01)
  peptideFDR: 0.01                             # peptide FDR level (default 0.01)
  ionFDR: 0.01                                 # peptide ion FDR level (default 0.01)
  proteinFDR: 0.01                             # protein FDR level (default 0.01)
  peptideProbability: 0.7                      # top peptide probability threshold for the FDR filtering (default 0.7)
  proteinProbability: 0.5                      # protein probability threshold for the FDR filtering (not used with the razor algorithm) (default 0.5)
  peptideWeight: 1                             # threshold for defining peptide uniqueness (default 1)
  razor: true                                 # use razor peptides for protein FDR scoring
  picked: false                                # apply the picked FDR algorithm before the protein scoring
  mapMods: true                               # map modifications acquired by an open search
  models: false                                # print model distribution
  sequential: true                            # alternative algorithm that estimates FDR using both filtered PSM and Protein lists

freequant:
  peakTimeWindow: 0.4                          # specify the time windows for the peak (minute) (default 0.4)
  retentionTimeWindow: 3                       # specify the retention time window for xic (minute) (default 3)
  tolerance: 30                                # m/z tolerance in ppm (default 10) # precursor mass tolerance need to be 30 for cpcgene
  isolated: false                              # use the isolated ion instead of the selected ion for quantification

report:
  msstats: true                               # create an output compatible to MSstats
  withDecoys: true                            # add decoy observations to reports
  mzID: true                                  # create a mzID output
prvst commented 4 years ago

@foreverwander

Could you try the tutorials that we have on our wiki? We have a few different scenarios, and the basic one should be a good starting point.

prvst commented 3 years ago

__