MSGFPlus / msgfplus

MS-GF+ (aka MSGF+ or MSGFPlus) performs peptide identification by scoring MS/MS spectra against peptides derived from a protein sequence database.
Other
72 stars 36 forks source link

Generating decoy database + further processing #150

Closed JannikSchneider12 closed 5 months ago

JannikSchneider12 commented 5 months ago

Hey,

I wanted to use msgf+ and after that percolator. So for the further conversion, I also need the decoy database. However when I am creating it with the -tda parameter, I am getting this error:

(peptideidentification) jannik@DELLXPS:~/msgfplus$ java -Xmx3500M -jar MSGFPlus.jar -s /home/jannik/data/consensus_spectra/falcon_run_1_50_conservative.mgf -d /home/jannik/data/BSA.fasta -tda 1 MS-GF+ Release (v2023.01.12) (12 January 2023) Java 11.0.21 (Ubuntu) Linux (amd64, version 5.15.133.1-microsoft-standard-WSL2) Loading database files... Creating /home/jannik/data/BSA.revCat.fasta. java.lang.NullPointerException at edu.ucsd.msjava.msdbsearch.ReverseDB.reverseDB(ReverseDB.java:91) at edu.ucsd.msjava.ui.MSGFPlus.runMSGFPlus(MSGFPlus.java:225) at edu.ucsd.msjava.ui.MSGFPlus.runMSGFPlus(MSGFPlus.java:113) at edu.ucsd.msjava.ui.MSGFPlus.main(MSGFPlus.java:61)

Can somebody help me?

Thanks for your help and time

FarmGeek4Life commented 5 months ago

The only possibility I can see without access to the .fasta file is that the very first line in the file doesn't start with '>'.

JannikSchneider12 commented 5 months ago

Thanks for your help. I tried it with a fasta from uniprot and that worked. But I have a follow up question. I wanted to use percolator and thus convert the output to a pin file. The procedure also expects a decoy database but in mzid format (Usage: msgf2pin [options] target.mzid decoy.mzid). However I could not find a generated mzid file for the decoy database and also not a parameter in the documentation. So I only found the four auxiliary files that are created out of my database.

Thanks again for your time and help

FarmGeek4Life commented 5 months ago

I am not that familiar with percolator, but I believe what it wants is for you to perform a target-only search and then a decoy-only search (both with -tda 0, and the decoy search with a separately-generated decoy-only .fasta file), and use the resulting .mzid files as the input to msgf2pin. I don't have an answer on what you can use to generate the decoy-only .fasta file, as MS-GF+ only generates a single .fasta file with the target proteins first and then adds all of the proteins again but with a name prefix and the sequences reversed.

JannikSchneider12 commented 5 months ago

Thanks for your help. I looked at pyteomics and could use a function to create a database just with decoys. The prefix that I am setting to my decoy database should not matter in this case, right?

FarmGeek4Life commented 5 months ago

I don't think it does, but XXX_ is the default MS-GF+ uses, and I know some other tools use REV_; the pyteomics default is DECOY_. I suggest you use whatever is preferred by software processing the .mzid file.