Closed ekawaler closed 4 years ago
Hi @ekawaler
Could you paste the original YAML file here? Also, what is the version Philosopher that you have?
INFO[12:58:32] Current Philosopher build and version build=1593192429 version=v3.2.9
I added the .txt extension so I could attach the file here; obviously when I run my code it's the normal .yml extension. Thanks for taking a look!
Could you share the error message as well? The entire log message would be better.
Here's the error log! slurm-9368473.out.txt
Any ideas yet? I still unfortunately haven't succeeded in getting it to run.
Are you using FragPipe?
I'm using the Philosopher pipeline from the command line. I think the problem has to do with how the arguments are passed to PeptideProphet - somehow it's getting a blank where there should be a database path. Is it possible that I'm using a character somewhere in the list of arguments that's causing the parsing to be off?
I've managed to get it to run on a dataset where it was not previously running. Current working hypothesis: if your database/dataset is too large, you may have to set the concurrent variable in peptideprophet to "false"?
oh, I see the issue now, you are using the concurrent processing. Yes, that is still experimental, please turn it off if you are having issues, that should fix it.
Thanks! I would suggest maybe adding a comment to that effect in the sample .yml file so other people don't run into the same problem!
I'm still running that TMT pipeline on some CPTAC datasets and am getting a strange error with PeptideProphet where it claims it can't find a protein database. There are two odd quirks to this. First of all, MSFragger ran without issue. Secondly, it doesn't appear consistently. The first dataset I ran it on never produced any issues. The second gave me this error until I added a database line to the peptideprophet section of philosopher.yml, after which it ran smoothly. The third dataset, now, gives me this error whether or not I have a database line in the peptideprophet section. There are no differences between the setups for these three datasets other than folder names (and I've triple-checked that all my paths are correct). Is this a known error?
commands: workspace: yes # manage the experiment workspace for the analysis database: yes # target-decoy database formatting comet: no # peptide spectrum matching with Comet msfragger: yes # peptide spectrum matching with MSFragger peptideprophet: yes # peptide assignment validation ptmprophet: no # PTM site localization proteinprophet: no # protein identification validation filter: yes # statistical filtering, validation and False Discovery Rates assessment freequant: yes # label-free Quantification labelquant: yes # isobaric Labeling-Based Relative Quantification bioquant: no # protein report based on Uniprot protein clusters report: yes # multi-level reporting for both narrow-searches and open-searches abacus: yes # combined analysis of LC-MS/MS results tmtintegrator: yes # integrates channel abundances from multiple TMT samples
database: protein_database: /path/CCRCC/database/2020-08-04-decoys-contam-proteome.fasta # path to the target-decoy protein database decoytag: rev # prefix tag used added to decoy sequences
. . .
peptideprophet: # v5.2 concurrent: true # Concurrent execution of multiple instaces extension: pepXML # pepXML file extension clevel: 0 # set Conservative Level in neg_stdev from the neg_mean, low numbers are less conservative, high numbers are more conservative accmass: true # use Accurate Mass model binning decoyprobs: true # compute possible non-zero probabilities for Decoy entries on the last iteration enzyme: trypsin # enzyme used in sample (optional) exclude: false # exclude deltaCn, Mascot, and Comet results from results (default Penalize results) expectscore: true # use expectation value as the only contributor to the f-value for modeling forcedistr: false # bypass quality control checks, report model despite bad modeling glyc: false # enable peptide Glyco motif model icat: false # apply ICAT model (default Autodetect ICAT) instrwarn: false # warn and continue if combined data was generated by different instrument models leave: false # leave alone deltaCn, Mascot, and Comet results from results (default Penalize results) maldi: false # enable MALDI mode masswidth: 5 # model mass width (default 5) minpeplen: 7 # minimum peptide length not rejected (default 7) minpintt: 2 # minimum number of NTT in a peptide used for positive pI model (default 2) minpiprob: 0.9 # minimum probability after first pass of a peptide used for positive pI model (default 0.9) minprob: 0.05 # report results with minimum probability (default 0.05) minrtntt: 2 # minimum number of NTT in a peptide used for positive RT model (default 2) minrtprob: 0.9 # minimum probability after first pass of a peptide used for positive RT model (default 0.9) neggamma: false # use Gamma distribution to model the negative hits noicat: false # do no apply ICAT model (default Autodetect ICAT) nomass: false # disable mass model nonmc: false # disable NMC missed cleavage model nonparam: true # use semi-parametric modeling, must be used in conjunction with --decoy option nontt: false # disable NTT enzymatic termini model optimizefval: false # (SpectraST only) optimize f-value function f(dot,delta) using PCA phospho: false # enable peptide Phospho motif model pi: false # enable peptide pI model ppm: true # use PPM mass error instead of Dalton for mass modeling zero: false # report results with minimum probability 0 database: /path/CCRCC/database/2020-08-04-decoys-contam-proteome.fasta # path to the database