bigbio / quantms

Quantitative mass spectrometry workflow. Currently supports proteomics experiments with complex experimental designs for DDA-LFQ, DDA-Isobaric and DIA-LFQ quantification.
https://quantms.org
MIT License
30 stars 35 forks source link

percolator no decoys found #422

Open hgbrian opened 1 day ago

hgbrian commented 1 day ago

Description of the bug

First, to get the pipeline to not segfault on comet.exe, I had to update 'biocontainers/openms-thirdparty:3.1.0--h9ee0642_1' to biocontainers/openms-thirdparty:3.2.0--h9ee0642_4. There is an old issue https://github.com/bigbio/quantms/issues/270 that references this so maybe the problem came back?

I am trying to get a basic DIA pipeline to run and find some peptides. I have a DIA raw file (and its mzML) that I want to search for peptides in. I have tried the raw, the mzML, and various fast files. They all fail in the same way.

Since this was failing I tried one of the example files from proteomics-sample-metadata, PXD000396 and the 20k uniprot human protein fasta. This also fails in the same way as my original file, so I think this is a bug? (Or I am misunderstanding something very basic!)

The basic problem seems to be that percolator cannot find decoys, which I think stems from the fact that comet does not find any peptides (so the idXML is empty of peptides.)

Should comet even be running for a DIA pipeline? (I have also tried setting --labelling_type "label free sample" --acquisition_method dia to force DIA)

Command used and terminal output

## command

nextflow run quantms --input 'PXD000396.sdrf.tsv' --database 'human_proteome.fasta' --outdir './results' -profile docker

## output

-[nf-core/quantms] Pipeline completed with errors-
ERROR ~ Error executing process > 'NFCORE_QUANTMS:QUting docs: https://nf-co.re/docs/usage/troubleshooting

 -- Check '.nextflow.log' file for details

      percolator: $(percolator -h 2>&1 | grep -E '^Percolator version(.*)' | sed 's/Percolator version //g')
  END_VERSIONS

Command exit status:
  11

Command output:
  Loading input file: 120309QEx2_RS1_20nl-min_0k1HeLa_8h_01_comet_feat.idXML
  Merging peptide ids.
  Merging protein ids.
  No decoys found, search results discrimination impossible. Aborting!
  stty: standard input: Inappropriate ioctl for device

  PercolatorAdapter -- Facilitate input to Percolator and reintegrate.
  Full documentation: http://www.openms.de/doxygen/nightly/html/TOPP_PercolatorAdapter.html
  Version: 3.1.0-pre-exported-20231020 Oct 20 2023, 13:54:37
  To cite OpenMS:
   + Rost HL, Sachsenberg T, Aiche S, Bielow C et al.. OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat Meth. 2016; 13, 9: 741-748. doi:10.1038/nmeth.3959.

  Usage:
    PercolatorAdapter <options>

  Options (mandatory options marked with '*'):
    -in <files>                           Input file(s) (valid formats: 'mzid', 'idXML')
    -in_decoy <files>                     Input decoy file(s) in case of separate searches (valid formats: 'mzid', 'idXML')
    -in_osw <file>                        Input file in OSW format (valid formats: 'OSW')
    -out <file>*                          Output file (valid formats: 'idXML', 'mzid', 'osw')
    -out_type <type>                      Output file type -- default: determined from file extension or content. (valid: 'mzid', 'idXML', 'osw')
    -enzyme <enzyme>                      Type of enzyme: no_enzyme,elastase,pepsin,proteinasek,thermolysin,chymotrypsin,lys-n,lys-c,arg-c,asp-n,glu-c,trypsin,trypsinp (default: 'trypsin') (valid: 'no_enzyme', 'elastase', 'pepsin', 'proteinasek', 'thermolysin', 'chymotrypsin', 'lys-n', 'lys-c', 'arg-c', 'asp-n', 'glu-c', 'trypsin', 'trypsinp')
    -percolator_executable <executable>*  The Percolator executable. Provide a full or relative path, or make sure it can be found in your PATH environment.
    -peptide_level_fdrs                   Calculate peptide-level FDRs instead of PSM-level FDRs.
    -protein_level_fdrs                   Use the picked protein-level FDR to infer protein probabilities. Use the -fasta option and -decoy_pattern to set the Fasta file and decoy pattern.
    -osw_level <osw_level>                OSW: the data level selected for scoring. (default: 'ms2') (valid: 'ms1', 'ms2', 'transition')
    -score_type <type>                    Type of the peptide main score (default: 'q-value') (valid: 'q-value', 'pep', 'svm')

  Common TOPP options:
    -ini <file>                           Use the given TOPP INI file
    -threads <n>                          Sets the number of threads allowed to be used by the TOPP tool (default: '1')
    -write_ini <file>                     Writes the default configuration file
    --help                                Shows options

Command error:
  Loading input file: 120309QEx2_RS1_20nl-min_0k1HeLa_8h_01_comet_feat.idXML
  Merging peptide ids.
  Merging protein ids.
  No decoys found, search results discrimination impossible. Aborting!
  stty: standard input: Inappropriate ioctl for device

Relevant files

PXD000396.sdrf.tsv.zip human_proteome.fasta.zip

System information

ypriverol commented 1 day ago

Hi @hgbrian:

First of all, thanks for using quantms, I will try to do my best to help you here.