OpenMS / OpenMS

The codebase of the OpenMS project
https://www.openms.de
Other
478 stars 318 forks source link

No peptides were matched to the decoy portion of the database! #6556

Open LoayJabre opened 1 year ago

LoayJabre commented 1 year ago

Hi - I'm running MSGF+ through OpenMS and I'm facing a persistent error that No peptides were matched to decoys in my database. When I run my script, a revCat.fasta file is generated, and when I inspect it manually, it contains decoys prefixed with 'XXX_'

My script is as follows:

# Database searching and fdr application
set -e 
set -o xtrace 

database_fasta_file=$1
mzml_folder=$2

DIR=$mzml_folder
for FILE in "$DIR"*.mzML
do
    echo "Processing $FILE file..."
        temp_string=${FILE/.mzML/}

# formatting the input names so that they can properly feed into the database search
        db_string=${database_fasta_file/.fasta/}
        db_string_adjusted=$db_string'.fasta'
        db_string_revcat=$db_string'.revCat.fasta'
        echo $db_string
        echo $db_string_adjusted
        echo $db_string_revcat

# running the database search
        MSGFPlusAdapter -in $FILE -executable /software/MSGFPlus-022.04.18/MSGFPlus.jar -database $db_string_adjusted -out $temp_string'.idXML' -PeptideIndexing:decoy_string 'XXX_'  -PeptideIndexing:decoy_string_position 'prefix' -add_decoys 'true' -fixed_modifications 'Carbamidomethyl (C)' -threads 12 -java_memory 50000
        PeptideIndexer -in $temp_string'.idXML' -fasta $db_string_revcat -out $temp_string'_PI.idXML' -decoy_string 'XXX_' -threads 6 
        FalseDiscoveryRate -in $temp_string'_PI.idXML' -out $temp_string'_FDR.idXML' -PSM 'true' -FDR:PSM 0.01 -threads 4
done

I've also attached the .out file showing the error: database-searching-openmsv1_20221213.txt

When I run the same script but direct the -database to where the revCat.fasta file is found, i.e. MSGFPlusAdapter -in $FILE -executable /software/MSGFPlus-022.04.18/MSGFPlus.jar -database ../sonja_data/BB40_protein_coding_genome_with_ft_ID_with_crap_nonredun.revCat.fasta -out $temp_string'.idXML' -add_decoys 'false' -fixed_modifications 'Carbamidomethyl (C)' -threads 12 -java_memory 50000 the DB searching works fine. Could this be an issue where there's a confusion in where the script is looking for the .revCatfasta file?

I'm not very strong with coding, so any help would be appreciated!

jpfeuffer commented 1 year ago

If you are running Msgfplus with internal indexing, you need to pass the database with targets and decoys i.e. revcat and not adjusted.

jpfeuffer commented 1 year ago

If you are letting msgfplus add decoys, which is pretty much untested, you need to disable indexing. This is a bug in the indexing code @timosachsenberg

LoayJabre commented 1 year ago

Thanks so much for the help!

Just to double check that I'm understanding you correctly: If I have a database that already contains targets+decoys, it would be correct to use : MSGFPlusAdapter -in $FILE -executable /software/MSGFPlus-022.04.18/MSGFPlus.jar -database ../sonja_data/BB40_protein_coding_genome_with_ft_ID_with_crap_nonredun.revCat.fasta -out $temp_string'.idXML' -add_decoys 'false' -fixed_modifications 'Carbamidomethyl (C)' -threads 12 -java_memory 50000 where the database has a specific path?

I'm puzzled because I've run the same script with '-database $db_string_adjusted' multiple times in the past, and it worked fine. It stopped working when I upgraded to OpenMS v2.8.

I'm also not really sure how to disable indexing

jpfeuffer commented 1 year ago

Passing any path is always fine. Should not matter if via a variable or not. It matters if this file already contains decoys or not.

Auto indexing was recently added per default, so that changed to previous versions.

Yes this command would probably correct if you had decoys in the fasta. You do not need any subsequent PeptideIndexer.

Run the tool with --helphelp and you will see all possible options including how to disable indexing.

LoayJabre commented 1 year ago

Thank you!! It works perfectly now.

jpfeuffer commented 1 year ago

Good to hear. I keep it open for @timosachsenberg to fix the behaviour when add_decoys is active.

timosachsenberg commented 1 year ago

There are two ways to fix this:

  1. disable the add_decoy option in the adapter to force that already a proper (e.g., consistently generated) td-database is added
  2. modify SearchEngineBase to take a different database filename for reindexing

I would prefer 1 because it is easier to implement and ensures that decoy databases are the same between e.g., different search engines