Nesvilab / FragPipe

A cross-platform proteomics data analysis suite
http://fragpipe.nesvilab.org
Other
186 stars 37 forks source link

Peptide prophet issues #39

Closed KunathBJ closed 6 years ago

KunathBJ commented 6 years ago

Hello,

I believe I have an issue/might need help with peptide prophet. I'm on Linux and try to run the latest version of MSFraggerGUI and I get that kind of error when I try to analyze an MZML file:

"Executing command: $> java -cp /mnt/users/benoitk/MSFragger/MSFragger-GUI.jar umich.msfragger.util.FileMove /mnt/users/benoitk/Proteomics/MZML_PW/T7C_11.tsv /mnt/users/benoitk/MSFragger/output/T7C_11.tsv Process started Process finished, exit value: 0

Executing command: $> /mnt/users/benoitk/MSFragger/MSFragger-20171106/philosopher_linux_amd64 workspace --init Process started time="17:54:47" level=info msg="Creating workspace" time="17:54:47" level=info msg=Done

Process finished, exit value: 0

Executing command: $> /mnt/users/benoitk/MSFragger/MSFragger-20171106/philosopher_linux_amd64 peptideprophet --nonparam --expectscore --decoy rev --decoyprobs --masswidth 1000.0 --clevel 2 --database /mnt/users/benoitk/Moda/moda_v1.51/Database/Prokka_contigs500_Concat_Decoy.fa /mnt/users/benoitk/MSFragger/output/T7C_11.tsv Process started time="17:54:47" level=info msg="Executing PeptideProphet"

terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector >' what(): failed opening file: No such file or directory terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector >' what(): failed opening file: No such file or directory time="17:54:49" level=info msg=Done

  • /mnt/users/benoitk/MSFragger/output/interact-T7C_11.pep.xml Using Decoy Label "rev". Decoy Probabilities will be reported. Using non-parametric distributions

Process finished, exit value: 0

Executing command: $> /mnt/users/benoitk/MSFragger/MSFragger-20171106/philosopher_linux_amd64 workspace --clean Process started time="17:54:50" level=info msg="Removing workspace" time="17:54:50" level=info msg=Done

Process finished, exit value: 0

Executing command: $> /mnt/users/benoitk/MSFragger/MSFragger-20171106/philosopher_linux_amd64 workspace --init Process started time="17:54:51" level=info msg="Creating workspace"

time="17:54:51" level=info msg=Done

Process finished, exit value: 0

Executing command: $> /mnt/users/benoitk/MSFragger/MSFragger-20171106/philosopher_linux_amd64 proteinprophet --output interact --maxppmdiff 20 interact-T7C_11.pep.xml Process started time="17:54:51" level=info msg="Executing ProteinProphet" 2018/02/19 17:54:51 open /tmp/91e605bd-01e5-4338-ae94-8a0b867c8753/interact.prot.xml: no such file or directory

ProteinProphet (C++) by Insilicos LLC and LabKey Software, after the original Perl by A. Keller (TPP v5.0.1 Post-Typhoon dev, Build 201705221614-exported (Linux-x86_64)) Error! Input file not found: /mnt/users/benoitk/MSFragger/output/interact-T7C_11.pep.xml Exiting

Process finished, exit value: 1

Executing command: $> /mnt/users/benoitk/MSFragger/MSFragger-20171106/philosopher_linux_amd64 workspace --clean Process started time="17:54:52" level=info msg="Removing workspace" time="17:54:52" level=info msg=Done

Process finished, exit value: 0

Executing command: $> /mnt/users/benoitk/MSFragger/MSFragger-20171106/philosopher_linux_amd64 workspace --init Process started time="17:54:54" level=info msg="Creating workspace" time="17:54:54" level=info msg=Done

Process finished, exit value: 0

Executing command: $> /mnt/users/benoitk/MSFragger/MSFragger-20171106/philosopher_linux_amd64 database --annotate /mnt/users/benoitk/Moda/moda_v1.51/Database/Prokka_contigs500_Concat_Decoy.fa Process started time="17:54:54" level=info msg="Processing database"

time="17:54:58" level=info msg=Done

Process finished, exit value: 0

Executing command: $> /mnt/users/benoitk/MSFragger/MSFragger-20171106/philosopher_linux_amd64 filter --sequential --mapmods --pepxml /mnt/users/benoitk/MSFragger/output --protxml /mnt/users/benoitk/MSFragger/output/interact.prot.xml Process started time="17:54:58" level=info msg="Executing filter" time="17:54:58" level=info msg="Processing peptide identification files" time="17:54:58" level=fatal msg="No pepXML files found, check your files and try again"

Process finished, exit value: 1

Executing command: $> /mnt/users/benoitk/MSFragger/MSFragger-20171106/philosopher_linux_amd64 report Process started time="17:54:58" level=info msg="Executing report" time="17:54:58" level=fatal msg="cannot restore serialized data structures: invalid argument"

Process finished, exit value: 1

Executing command: $> /mnt/users/benoitk/MSFragger/MSFragger-20171106/philosopher_linux_amd64 workspace --clean Process started time="17:55:00" level=info msg="Removing workspace" time="17:55:00" level=info msg=Done

Process finished, exit value: 0

========================= === Done"

I start wandering if the issue is not due to Philosopher as I have also issue when I try to analyze MGF files from ProteoWizard. I've downloaded philosopher_linux_amd64, and I could put the path to it in MSFragger, but still it doesn't seem to use it.

Do you guys have any idea of what I'm missing?

Thanks a lot for your help! Best, Benoit

andytyk commented 6 years ago

Hi Benoit,

It seems that you're having MSFragger output tsv files instead of pepXML. The rest of the pipeline requires the output to be in pepXML format (with the extension pepXML) so that's why it's failing. TSV output is more for taking a quick look at the data or for users that want to use the output in some other pipeline without dealing with the complexities of the pepXML format.

Just change the file output extension and output type to pepXML and let us know if that solves anything.

Best, Andy

KunathBJ commented 6 years ago

Hello, Thanks a lot. That worked of course.

I might have 2 more questions, Just correct me If I should have opened other threads for them.

1) I obtained 2 warnings during the process:

"INFO: Processing standard MixtureModel ... Initialising statistical models ... WARNING: Mixture model quality test failed for charge (1+). WARNING: Mixture model quality test failed for charge (6+). WARNING: Mixture model quality test failed for charge (7+)."

and many of those: "WARNING: Trying to compute mass of non-residue: e WARNING: Trying to compute mass of non-residue: v WARNING: Trying to compute mass of non-residue: v WARNING: Trying to compute mass of non-residue: r"

What do they mean? are they critical?

And finally, 1 line said: "PeptideProphet read in 0 1+, 7262 2+, 8011 3+, 2172 4+, 348 5+, 0 6+, and 0 7+ spectra. Found 2184 Decoys, and 15609 Non-Decoys"

But later on I got those messages Executing command:

$> /mnt/users/benoitk/MSFragger/MSFragger-20171106/philosopher_linux_amd64 filter --sequential --mapmods --pepxml /mnt/users/benoitk/MSFragger/output/T1 --protxml /mnt/users/benoitk/MSFragger/output/T1/interact.prot.xml Process started time="19:20:42" level=info msg="Executing filter" time="19:20:42" level=info msg="Processing peptide identification files"

time="19:20:44" level=info msg="1+ Charge profile" decoy=0 target=0 time="19:20:44" level=info msg="2+ Charge profile" decoy=0 target=4555 time="19:20:44" level=info msg="3+ Charge profile" decoy=0 target=5653 time="19:20:44" level=info msg="4+ Charge profile" decoy=0 target=1241 time="19:20:44" level=info msg="5+ Charge profile" decoy=0 target=115 time="19:20:44" level=info msg="6+ Charge profile" decoy=0 target=0 time="19:20:44" level=info msg="Database search results" ions=7451 peptides=6037 psms=11564

time="19:20:44" level=info msg="Converged to 0.00 % FDR with 11564 PSMs" decoy=0 threshold=0.0519 total=11564

time="19:20:44" level=info msg="Converged to 0.00 % FDR with 6037 Peptides" decoy=0 threshold=0.0519 total=6037 time="19:20:44" level=info msg="Converged to 0.00 % FDR with 7451 Ions" decoy=0 threshold=0.0519 total=7451

time="19:20:45" level=info msg="Protein inference results" decoy=0 target=1105 time="19:20:45" level=info msg="Converged to 0.00 % FDR with 976 Proteins" decoy=0 threshold=0.8903 total=976

time="19:20:45" level=info msg="Applying sequential FDR estimation" ions=7318 peptides=5911 psms=11414 time="19:20:45" level=info msg="Converged to 0.00 % FDR with 11414 PSMs" decoy=0 threshold=0.0578 total=11414

time="19:20:46" level=info msg="Converged to 0.00 % FDR with 5911 Peptides" decoy=0 threshold=0.0578 total=5911 time="19:20:46" level=info msg="Converged to 0.00 % FDR with 7318 Ions" decoy=0 threshold=0.0578 total=7318

How is that possible to get those 0% FDR??

Thank you very much for your help!

best, Benoit

anesvi commented 6 years ago

1) "INFO: Processing standard MixtureModel ... Initialising statistical models ... WARNING: Mixture model quality test failed for charge (1+). WARNING: Mixture model quality test failed for charge (6+). WARNING: Mixture model quality test failed for charge (7+)

No problem here. Just means you had not enough 1+ or 6+/7+ spectra that were high scoring. This is normal.

2) and many of those: "WARNING: Trying to compute mass of non-residue: e WARNING: Trying to compute mass of non-residue: v WARNING: Trying to compute mass of non-residue: v WARNING: Trying to compute mass of non-residue: r"

What sequence database did yo use? It should be fasta format with capital letter sequences. PeptideProphet cannot recognize what those e,v, r amino acids are

3) You need to pass decoy tag to Philosopher. For example if decoys start with DECOY_... add --tag DECOY

Alexey

KunathBJ commented 6 years ago

Hello.

Thanks for your help I don't have any e/v/r in my database. And I have a decoy tag for my database but it doesn't seem to take it. Funnily, the decoy tag is.... rev... They're might be something wrong there. I'll double check. Just one thing, you said: add --tag DECOY in Philosopher. It can sound stupid, but where do I add that exactly? So far I only have: --nonparam --expectscore --decoy rev --decoyprobs --masswidth 1000.0 --clevel 2 in the PeptideProphet section. I don't have any Philosopher section.

I had an extra question regarding the MGF files as an input. I have MGFs generated by proteowizard. Yet, it still give me error message such as:

Operating on slice 1 of 1: 11385ms T1A_10_noID_DP.mgf 413ms Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58) Caused by: java.lang.NullPointerException at n.b(Unknown Source) at MSFragger.main(Unknown Source) ... 5 more

Is it possible that something is wrong with my file? I attached a subset of one of them. T1A_10_noID_DP.txt

Thanks a lot for your help. Best, Benoit

andytyk commented 6 years ago

Hi Benoit,

There seems to be all kinds of quotes around the titles in the MGF file. We don't really support MGFs, if you can, try converting to mzML using Proteowizard. MGF support is pretty much limited to Thermo raw files converted using Proteowizard.

Best, Andy

KunathBJ commented 6 years ago

Ok. Thank you, I'll look for that.

Is there another place where I'm supposed to give the decoy information to the software except the: --nonparam --expectscore --decoy rev --decoyprobs --masswidth 1000.0 --clevel 2 in the PeptideProphet section?? Because I can't find anything else.

Best, Benoit

chhh commented 6 years ago

@KunathBJ For peptide prophet the you can change that --decoy option that you've noticed in the free-text parameters. For "Philosopher Report" there is a separate text filed "Filter" on the "Report" tab. Those text fields are more or less synchronized, if you change --decoy on peptide prophet tab, it will ask if you want to also change it for philosopher report and vice versa.

KunathBJ commented 6 years ago

Hey.

Thanks for your help. I think I managed to fix the issues with my database as i no longer receive the error message about the e/r/v residues. And it seems like the tag for the decoy sequence is properly recognized. However, When I look at the report file, I get 50% of the hit that are from decoy sequences. Is there an extra step that has to be done to fix that? I can't find info about it.

Thanks a lot, Benoit

anesvi commented 6 years ago

I am not sure what you mean: You wrote earlier "PeptideProphet read in 0 1+, 7262 2+, 8011 3+, 2172 4+, 348 5+, 0 6+, and 0 7+ spectra. Found 2184 Decoys, and 15609 Non-Decoys"

So you should have a lot more non-decoys then decoys

Also decoys are usually filtered in the philosopher results Where do you see 50% of decoys, in what reports?

Alexey

From: KunathBJ [mailto:notifications@github.com] Sent: Saturday, February 24, 2018 12:45 PM To: chhh/MSFragger-GUI Cc: Nesvizhskii, Alexey; Comment Subject: Re: [chhh/MSFragger-GUI] Peptide prophet issues (#39)

Hey.

Thanks for your help. I think I managed to fix the issues with my database as i no longer receive the error message about the e/r/v residues. And it seems like the tag for the decoy sequence is properly recognized. However, When I look at the report file, I get 50% of the hit that are from decoy sequences. Is there an extra step that has to be done to fix that? I can't find info about it.

Thanks a lot, Benoit

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/chhh/MSFragger-GUI/issues/39#issuecomment-368245702, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AhDGe1dAT4LbfOJZYjZkry0yNXMOLF_uks5tYEqBgaJpZM4SK5vQ.


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

KunathBJ commented 6 years ago

Hello,

That's what I thought. I look at the report.tsv that is generated after analysis. And in there, 50% of the protein entries are decoy. Am I looking at the wrong place?

Benoit

prvst commented 6 years ago

@KunathBJ

I believe I can help you with FDR scoring and filtering using Philosopher. First I need you to do the following for me:

  1. On your workspace folder where you have all your files and results, the same where you have been pointing the GUI so far, open a terminal window and run philosopher workspace --backup. You should see a zip file there after a few seconds (note that the name of the philosopher binary might be different depending on the version you downloaded from the website).

  2. Send me the zip file, your database and the complete output you have from the analysis. You can share them here, via Dropbox or you can send them directly to my em-mail, let me know what suits you better.

KunathBJ commented 6 years ago

Hello,

The GUI is in one folder, then the MSFragger.jar and and philosopher are in one of the sub-folder. Output are elsewhere. When I run philosopher workspace --backup in the philosopher folder, I get the error "cannot find the meta data". Anything I can do from here? Thanks a lot,

Benoit

prvst commented 6 years ago

Try running your analysis again, but this time work on the same folder. You can leave the binaries on a different place, but try using the same one for all the rest.

KunathBJ commented 6 years ago

ok. I'll do that. Thanks

chhh commented 6 years ago

@KunathBJ When the GUI runs the tools it always sets the working directory to the output directory. So the tools are run as if you first cd <output-dir> from the console and then run tools using either their absolute or relative paths to the output directory.

KunathBJ commented 6 years ago

Hello,

I re ran one of the analysis, with everything (the GUI, MSFragger.jar, philosopher_amd64 and the outputs) in the same directory, yet I got the same message again stating that it cannot find the meta data.

prvst commented 6 years ago

can you share the output log ? please save it to a file instead of pasting the entire message here

On Tue, Feb 27, 2018 at 6:45 AM KunathBJ notifications@github.com wrote:

Hello,

I re ran one of the analysis, with everything (the GUI, MSFragger.jar, philosopher_amd64 and the outputs) in the same directory, yet I got the same message again stating that it cannot find the meta data.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/chhh/MSFragger-GUI/issues/39#issuecomment-368847768, or mute the thread https://github.com/notifications/unsubscribe-auth/AAYo-b4PJ09r_j4TslDNz35DKx0AS5AUks5tY-rXgaJpZM4SK5vQ .

-- Felipe da Veiga Leprevost, Ph.D. www.leprevost.com.br Proteome Bioinformatics Lab University of Michigan

KunathBJ commented 6 years ago

Hey. Sure! Here it is. Thank you very much for your help!!

Benoit Log_Kunath.txt

prvst commented 6 years ago

@KunathBJ I don't see anything wrong with your analysis execution. Can you send me your reports, your database and your fragger parameter file ?

KunathBJ commented 6 years ago

Hello,

Everything seemed fine. The only thing I find strange is the presence of the rev_proteins in the report file. Here are the files. Thanks a lot

report_Kunath.txt fragger.params.txt

DB_Kunath.txt

prvst commented 6 years ago

@KunathBJ, @anesvi

Judging by the way your report looks like I have the impression that PeptideProphet (and probably ProteinProphet) could not properly form the matching groups and the protein groups.

This is one example of why I think the problem occurred during the peptide validation or, most probably during the protein inference:


1003 | a | Bin1_Clostridium_Ga0196617_100009171 indolepyruvate ferredoxin oxidoreductase alpha subunit
1005 | a | Bin1_Clostridium_Ga0196617_100009171 indolepyruvate ferredoxin oxidoreductase alpha subunit

As you can see above, these two proteins belong to two different groups 1003 and 1005, but they are in fact, one. There's no known bug with the prophets that could cause such error, so I believe that the real source of the problem might be your database, more specifically the formatting you use for the FASTA headers. The prophets truncate the headers when reporting them on the PSM and protein groups, so it is possible that the current format is inducing some sort of confusion on the program.

My suggestion to you is that you try changing the FASTA headers to a more consistent format or ,if possible try to follow the standard used by UniProt or NCBI.

Can you also share your pepXML and protXML files ?

anesvi commented 6 years ago

Yes, Looking at protXML should really tell us where the problem is (in ProteinProphet or Philosopher). But please transfer pepXML to us just in case as well. You can send us directly without posting on the github.

Alexey


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

KunathBJ commented 6 years ago

Hello, Thanks a lot for your help. I sent the files to Felipe. Meanwhile. I'll try to rearrange the header and make them Uniprot-like. Thanks,

Benoit

chhh commented 6 years ago

Closing due to no follow-up