PeptideProphet throws fatal msg="You need to provide a protein database"

ekawaler commented 4 years ago

I'm still running that TMT pipeline on some CPTAC datasets and am getting a strange error with PeptideProphet where it claims it can't find a protein database. There are two odd quirks to this. First of all, MSFragger ran without issue. Secondly, it doesn't appear consistently. The first dataset I ran it on never produced any issues. The second gave me this error until I added a database line to the peptideprophet section of philosopher.yml, after which it ran smoothly. The third dataset, now, gives me this error whether or not I have a database line in the peptideprophet section. There are no differences between the setups for these three datasets other than folder names (and I've triple-checked that all my paths are correct). Is this a known error?

commands: workspace: yes # manage the experiment workspace for the analysis database: yes # target-decoy database formatting comet: no # peptide spectrum matching with Comet msfragger: yes # peptide spectrum matching with MSFragger peptideprophet: yes # peptide assignment validation ptmprophet: no # PTM site localization proteinprophet: no # protein identification validation filter: yes # statistical filtering, validation and False Discovery Rates assessment freequant: yes # label-free Quantification labelquant: yes # isobaric Labeling-Based Relative Quantification bioquant: no # protein report based on Uniprot protein clusters report: yes # multi-level reporting for both narrow-searches and open-searches abacus: yes # combined analysis of LC-MS/MS results tmtintegrator: yes # integrates channel abundances from multiple TMT samples

database: protein_database: /path/CCRCC/database/2020-08-04-decoys-contam-proteome.fasta # path to the target-decoy protein database decoytag: rev # prefix tag used added to decoy sequences

. . .

peptideprophet: # v5.2 concurrent: true # Concurrent execution of multiple instaces extension: pepXML # pepXML file extension clevel: 0 # set Conservative Level in neg_stdev from the neg_mean, low numbers are less conservative, high numbers are more conservative accmass: true # use Accurate Mass model binning decoyprobs: true # compute possible non-zero probabilities for Decoy entries on the last iteration enzyme: trypsin # enzyme used in sample (optional) exclude: false # exclude deltaCn, Mascot, and Comet results from results (default Penalize results) expectscore: true # use expectation value as the only contributor to the f-value for modeling forcedistr: false # bypass quality control checks, report model despite bad modeling glyc: false # enable peptide Glyco motif model icat: false # apply ICAT model (default Autodetect ICAT) instrwarn: false # warn and continue if combined data was generated by different instrument models leave: false # leave alone deltaCn, Mascot, and Comet results from results (default Penalize results) maldi: false # enable MALDI mode masswidth: 5 # model mass width (default 5) minpeplen: 7 # minimum peptide length not rejected (default 7) minpintt: 2 # minimum number of NTT in a peptide used for positive pI model (default 2) minpiprob: 0.9 # minimum probability after first pass of a peptide used for positive pI model (default 0.9) minprob: 0.05 # report results with minimum probability (default 0.05) minrtntt: 2 # minimum number of NTT in a peptide used for positive RT model (default 2) minrtprob: 0.9 # minimum probability after first pass of a peptide used for positive RT model (default 0.9) neggamma: false # use Gamma distribution to model the negative hits noicat: false # do no apply ICAT model (default Autodetect ICAT) nomass: false # disable mass model nonmc: false # disable NMC missed cleavage model nonparam: true # use semi-parametric modeling, must be used in conjunction with --decoy option nontt: false # disable NTT enzymatic termini model optimizefval: false # (SpectraST only) optimize f-value function f(dot,delta) using PCA phospho: false # enable peptide Phospho motif model pi: false # enable peptide pI model ppm: true # use PPM mass error instead of Dalton for mass modeling zero: false # report results with minimum probability 0 database: /path/CCRCC/database/2020-08-04-decoys-contam-proteome.fasta # path to the database

prvst commented 4 years ago

Hi @ekawaler

Could you paste the original YAML file here? Also, what is the version Philosopher that you have?

ekawaler commented 4 years ago

INFO[12:58:32] Current Philosopher build and version build=1593192429 version=v3.2.9

philosopher.yml.txt

I added the .txt extension so I could attach the file here; obviously when I run my code it's the normal .yml extension. Thanks for taking a look!

prvst commented 4 years ago

Could you share the error message as well? The entire log message would be better.

ekawaler commented 4 years ago

Here's the error log! slurm-9368473.out.txt

ekawaler commented 3 years ago

Any ideas yet? I still unfortunately haven't succeeded in getting it to run.

prvst commented 3 years ago

Are you using FragPipe?

ekawaler commented 3 years ago

I'm using the Philosopher pipeline from the command line. I think the problem has to do with how the arguments are passed to PeptideProphet - somehow it's getting a blank where there should be a database path. Is it possible that I'm using a character somewhere in the list of arguments that's causing the parsing to be off?

ekawaler commented 3 years ago

I've managed to get it to run on a dataset where it was not previously running. Current working hypothesis: if your database/dataset is too large, you may have to set the concurrent variable in peptideprophet to "false"?

prvst commented 3 years ago

oh, I see the issue now, you are using the concurrent processing. Yes, that is still experimental, please turn it off if you are having issues, that should fix it.

ekawaler commented 3 years ago

Thanks! I would suggest maybe adding a comment to that effect in the sample .yml file so other people don't run into the same problem!

Nesvilab / philosopher

PeptideProphet throws fatal msg="You need to provide a protein database" #154