MSGFPlus / msgfplus

MS-GF+ (aka MSGF+ or MSGFPlus) performs peptide identification by scoring MS/MS spectra against peptides derived from a protein sequence database.
Other
73 stars 36 forks source link

Not sure if I'm running this correctly. All QValue and PepQValue are 0 #98

Closed ttessie2 closed 4 years ago

ttessie2 commented 4 years ago

I've just starting running MSGF+. I'm just using some publicly available yeast data from MASSIVE. I used philosopher to create my target-decoy database. Things seem to be running without any errors however when I look at the tsv file all the QValues are 0. I double checked the -decoy parameter to make sure it matched what my decoys are appended with and that didn't solve anything. If I search through the tsv file I find rev decoy hits in there. Where could this source of error be coming from?

FarmGeek4Life commented 4 years ago

Because you used philosopher to create your target-decoy database, you are probably searching with -tda 0. If you use -tda 0, the QValues (and PepQValues) are not calculated; MS-GF+ does not automatically check for decoy prefixes in the .fasta file.

What should work for this is the following:

What this does is bypass the MS-GF+ decoy creation process, while still performing a full target-decoy search.

ttessie2 commented 4 years ago

Thank you for the reply! I originally had it set to -tda 0 but the error you get says:

Error while indexing: 2020-04-08-decoys-contam-UP000002311.revCat.revCat.fasta (too many redundant proteins) If the database contains forward and reverse proteins, run MS-GF+ (or BuildSA) again with "-tda 0" If the decoy protein names do not start with XXX either rename them, or use the -decoy switch

After reading that I switched it to '-tda 1' and added '-decoy rev' because the database contains forward and reverse proteins and they do not start with XXX. Looking at the decoys they are appended rev_ but in my command line I only put rev. Would this make a difference??

For clarity this is what I entered originally. This ran but the QValues were 0. C:\MSGF+>java -Xmx3500M -jar MSGFPlus.jar -s C:\TPP\data\params\WT_Rep_1_Resp_Prot.mzML -d C:\FragPipe_Skyline\Philosopher\2020-04-08-decoys-contam-UP000002311.fa -inst 1 -t 20ppm -ti 1,2 -ntt 2 -tda 1 -decoy rev -o demo.mzid

FarmGeek4Life commented 4 years ago

Do not enter the ".revCat.fasta" on the command line for MS-GF+; give it the ".fasta" file, and have the ".revCat.fasta" file in the same directory. MS-GF+ will automatically find the ".revCat.fasta" file and use it instead of generating it.

FarmGeek4Life commented 4 years ago

And, let me look at the code a little; it's possible that there is some automatic handling of some of this that I don't remember.

ttessie2 commented 4 years ago

Okay, so should I have -d database.fasta -tda 1 -decoy rev? So when the program runs it will look for the .revCat.fasta file within the directory? And I shouldn't have the target database and reverse database within the same file?

FarmGeek4Life commented 4 years ago

Okay, so should I have -d database.fasta -tda 1 -decoy rev? yes So when the program runs it will look for the .revCat.fasta file within the directory? yes And I shouldn't have the target database and reverse database within the same file? For the database.fasta file - it may not matter (but it definitely needs the target database). The database.revCat.fasta file needs to have both the target and reverse/decoy database, as one file. (the name "revCat" is just a shortened form of "reverse concatenated", meaning it has both target and decoy hits.)

ttessie2 commented 4 years ago

Okay, I'm running this now so I'll see how it goes. When you say the .revCat.fasta should be in the same directory. Are you referring to the same directory the fasta.db file?

FarmGeek4Life commented 4 years ago

database.fasta and database.revCat.fasta need to be in the same directory, e.g. on Windows that might be: C:\msgf\database.fasta C:\msgf\database.revCat.fasta

ttessie2 commented 4 years ago

Thanks for the help! I have it working now. Much appreciated! I am new to MS analysis so this has been great. Last question if you don't mind my asking. What is your preferred next step for protein level analysis? I am more familiar with the TPP pipeline using iprophet -> proteinProphet. It looks like MSGF+ just gives PSM and peptide level analysis, is that correct?

alchemistmatt commented 4 years ago

Correct: MS-GF+only identifies peptides and reports the proteins that they're associated with. Some options for protein rollup are IDPicker and InfernoRDN. There are also several commercial tools that do a great job, including Scaffold

I suggest IDPicker, since it supports protein parsimony and combining multiple datasets. In contrast, InfernoRDN, just supports protein rollup on a single dataset at a time. IDPicker should be able to read the .mzid files created by MS-GF+.

If you want to perform quantitation (using Selected Ion Chromatograms of the MS1 parent ions), you'd have to analyze your data with MASIC then merge the MASIC results with MS-GF+ using MASIC Results Merger.

ttessie2 commented 4 years ago

Great, thank you!

Jokendo-collab commented 4 years ago

I think MS-GF+ developers should make the target-decoy search a default with tda 1 not tda 0

On Sat, Apr 11, 2020 at 2:39 PM ttessie2 notifications@github.com wrote:

Closed #98 https://github.com/MSGFPlus/msgfplus/issues/98.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/MSGFPlus/msgfplus/issues/98#event-3224019687, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGJ34O5GFSG6YSYDYW2LXRDRMBQIBANCNFSM4MFSZDWQ .