Immortalin / peptide-shaker

Automatically exported from code.google.com/p/peptide-shaker
1 stars 0 forks source link

Incompatibility with MSDA (https://msda.u-strasbg.fr/index.php) #2

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. identification
2.
3.

What is the expected output? What do you see instead?

Wed Sep 28 11:38:15 CEST 2011        Importing sequences from SWISSPROT.fasta.
Wed Sep 28 11:39:04 CEST 2011        FASTA file import completed.
Wed Sep 28 11:39:04 CEST 2011        Reading identification files.
Wed Sep 28 11:39:04 CEST 2011        Reading file: 163.omx
Wed Sep 28 11:39:20 CEST 2011        No identifications retained.
Wed Sep 28 11:39:21 CEST 2011        Your peptides have been shaken!

What version of the product are you using? On what operating system?
peptideShaker 0.9.3

Please provide any additional information below.

Original issue reported on code.google.com by lia.s...@gmail.com on 28 Sep 2011 at 9:57

Attachments:

GoogleCodeExporter commented 9 years ago
There seems to be a problem with the parsing of your omx file. Would it be 
possible for you to make the omx file available to us so that we can do some 
testing? You can either upload it here or send it via e-mail if you don't want 
the data to be online.

Original comment by harald.b...@gmail.com on 28 Sep 2011 at 11:03

GoogleCodeExporter commented 9 years ago
ok 
in attachement you'll find the omx file 

Original comment by lia.s...@gmail.com on 28 Sep 2011 at 11:44

Attachments:

GoogleCodeExporter commented 9 years ago
Apparently OMSSA has issues parsing your FASTA file. For proteins accessions it 
returns (see the omx file):
<MSPepHit_accession>sp|Q10MH8</MSPepHit_accession>
instead of:
<MSPepHit_accession>Q10MH8</MSPepHit_accession>

Peptide-Shaker thus tries to find the protein sp|Q10MH8 which does not exist :)

Which version of OMSSA are you using? How did you generate your FASTA file? 
Could you send it to us?

Thank you for your help!

Original comment by mvau...@gmail.com on 28 Sep 2011 at 12:59

GoogleCodeExporter commented 9 years ago
ok, to generate my omx file, I use MSDA tool web site(Mass Spectrometry data 
analysis) which include Omssa search engine.

Can I have you e-mail?
because the file is too big

Original comment by lia.s...@gmail.com on 28 Sep 2011 at 1:27

GoogleCodeExporter commented 9 years ago
I don't know the MSDA tool. Do you have a link?

To ensure compatability with PeptideShaker we recommend using SearchGUI to 
execute the searches. SearchGUI can be found here: 
http://searchgui.googlecode.com

But we'll look into supporting MSDA if possible.

Original comment by harald.b...@gmail.com on 28 Sep 2011 at 1:43

GoogleCodeExporter commented 9 years ago
 here the link of my fasta file downloading in MSDA web site tool 
http://dl.free.fr/hp1uknDV9

here the link of MSDA site web tool
https://msda.u-strasbg.fr/index.php

thanks

Original comment by lia.s...@gmail.com on 28 Sep 2011 at 1:45

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
I just can send you this links where you could download the fasta file 
http://dl.free.fr/hp1uknDV9

Original comment by lia.s...@gmail.com on 28 Sep 2011 at 1:52

GoogleCodeExporter commented 9 years ago
Hi again,

I had no problem running a search with your database. 

It breaks my Alsatian heart to say so but the incompatibility seems to come 
from MSDA. Actually the problem is the parsing of the FASTA file for which we 
use makeblastdb (version 2.2.24+) as advised by the OMSSA developers (Harald 
correct me if I'm wrong). I will ask them how they do it and will try to ensure 
compatibility.

On the other hand I advice you to use SearchGUI (searchgui.googlecode.com) for 
OMSSA. It will ensure compatibility with Peptide-Shaker and search with 
X!Tandem in parallel. Hence you have two search engines for the price of one :) 
SearchGUI is straightforward to handle and you will be able to run searches 
locally without having to register and provide personal information.

Also, Peptide-Shaker is designed for concatenated Target/Decoy search results. 
So you might want to use a concatenated Target/Decoy database which you can 
generate in SearchGUI from your fasta file. The decoy hits will be used for the 
calculation of confidence and FDR for your peptides and proteins.

Finally, your database contains all kind of species. The amount of 
identifications at a defined quality level will thus be dramatically reduced, 
the protein inference almost impossible and the search time will explode 
exponentially. You might want to tailor your database to the taxonomy you need 
(human?). This can be done by downloading the fasta files of the needed species 
from the Uniprot website (uniprot.org). In case you end up with several fasta 
files you can merge them using dbtoolkit (dbtoolkit.googlecode.com).

The current version of Peptide-Shaker does not support such large databases 
(see bug report 1). In the new version which will be released soon, large 
datasets and databases are better handled. The results will still be full of 
false positives though.

If you have more questions related to protein identification we encourage you 
to contact us via the Peptide-Shaker mailing list 
(groups.google.com/group/peptide-shaker).

Original comment by mvau...@gmail.com on 28 Sep 2011 at 2:47

GoogleCodeExporter commented 9 years ago
Ok, I tried SearchGUI and I have a problem with Omssa installation
The error message :" Failed to start Omssa, maake sure that Omssa installed 
correctly and that you have selected the correct version of OMSSA for your 
system> 

Original comment by lia.s...@gmail.com on 28 Sep 2011 at 2:54

GoogleCodeExporter commented 9 years ago
If you are using windows you should verify that the good version is selected 
(32 or 64 bits). In case it does not work (typically for versions older than 
windows 7) you might have to install missing libraries (see "OMSSA on Windows" 
on the searchGUI webpage, troubleshooting section: 
http://code.google.com/p/searchgui/#Troubleshooting). 

Original comment by mvau...@gmail.com on 28 Sep 2011 at 3:12

GoogleCodeExporter commented 9 years ago
ok thanks I'll try 

Original comment by lia.s...@gmail.com on 28 Sep 2011 at 3:19

GoogleCodeExporter commented 9 years ago
Dear all, 
We allow ourselves to enter the discussion group as we have seen the report of 
the issues related to MSDA and would just like to bring our comment on this 
issue. 
Indeed, the problem is related to the database formatting as we are using 
formatdb instead of makeblastdb. This because OMSSA Browser and Scaffold show 
problems when using makeblastdb. 
So to ensure that this will not be a limitation to use msda in the future, we 
will offer both possibilities soon so that people who want to visualize their 
.omx files in PeptideShaker will be able to do so. 
Also, as you are discussing database generation tools, our database generation 
toolbox on msda includes all you would need to extract any taxonomies from 
reference databases (NCBInr, UniProtKB, UniProtKB/Swiss-Prot)/ Add known 
sequences/ Contaminants/ Decoys/ Merge databases/ Generate Fasta files... 
https://msda.u-strasbg.fr/
This was for the short advertising part:-)
Best regards and don't hesitate to contact us
The "broken heart" alsatian team:-)

Original comment by alexandr...@gmail.com on 4 Oct 2011 at 9:08

GoogleCodeExporter commented 9 years ago
Thank you very much for your input. Actually we can solve this problem very 
easily if we make our omssa results parser compatible with files generated 
using formatdb. For this we only need to extract the accession number of the 
protein of interest. How do you retrieve it usually?

In the omx file sent previously the accession line is:
<MSPepHit_accession>sp|P58047</MSPepHit_accession>
do you know how it would look like for other kind of databases?

Original comment by mvau...@gmail.com on 4 Oct 2011 at 9:26

GoogleCodeExporter commented 9 years ago
We usually use Scaffold to visualize our OMX files. Scaffold searches the 
content of the MSPepHit_accession markups into the FASTA file, using regular 
expressions to split the accession number from the description (and another one 
to identify the decoy entries).

The accession line you pointed out is in fact the result of an old script we 
used to simplify the parsing of the accession numbers for our research team. 
This script have been removed from MSDA to ensure MSDA users that the OMX file 
format is fully respected. The current output is now :
<MSPepHit_accession>P67779</MSPepHit_accession>
<MSPepHit_accession>REVERSED_109477550_XP_001070433</MSPepHit_accession>

Original comment by alexandr...@gmail.com on 4 Oct 2011 at 1:46

GoogleCodeExporter commented 9 years ago
Great, I will make sure that the next version of PeptideShaker (to be released 
this month) handles this structure and decoy tag.

Original comment by mvau...@gmail.com on 4 Oct 2011 at 2:12

GoogleCodeExporter commented 9 years ago
I Try searchGUI to generate my omx file but when I put my omx file in 
peptideShaker tool, it's still not working. I try with the same mascot dat file 
version and it's work without problem.

Original comment by lia.s...@gmail.com on 12 Oct 2011 at 8:21

GoogleCodeExporter commented 9 years ago
Send me the files you use as input to PeptideShaker (omx, mgf and fasta) and 
I'll run them through the new version of PeptideShaker and see if I can figure 
out the problem.

Original comment by harald.b...@gmail.com on 12 Oct 2011 at 8:54

GoogleCodeExporter commented 9 years ago
ok,in attachement you'll find all ,
here the link to download the fasta file http://dl.free.fr/d1LogWLsc

Original comment by lia.s...@gmail.com on 12 Oct 2011 at 9:33

Attachments:

GoogleCodeExporter commented 9 years ago
Seems to work fine in the soon to be released new version of PeptideShaker. 
I'll let you know as soon as the new version has been released so that you can 
test it for yourself.

BTW, it's not recommended to use the whole of Swiss-Prot as the database. This 
will result in matches to multiple organisms (human, mouse, bacteria etc, etc). 
Something that doesn't make a lot of sense if you search with a human sample 
for example...

Original comment by harald.b...@gmail.com on 12 Oct 2011 at 10:55

GoogleCodeExporter commented 9 years ago
I also noticed that you don't use a target-decoy database. This means that the 
FDR-calculations (i.e., the protein validation) will be incorrect. This might 
be the problem opening your files in the old PeptideShaker version. (Something 
we fixed in the new version.)

You can easily add a decoy section to your database by clicking the "Decoy" 
button in the "Parameters Editor" tab in SearchGUI. Note however that this will 
take quite some time when used on your big FASTA file...

I'll test it and let you know.

Original comment by harald.b...@gmail.com on 12 Oct 2011 at 11:02

GoogleCodeExporter commented 9 years ago
Yes, I agree for the target decoy database, but I just try to use this tool 
that's why I use this file and this database. When I use the mascot dat file 
and the same database in peptide Shaker it work.

Original comment by lia.s...@gmail.com on 12 Oct 2011 at 11:07

GoogleCodeExporter commented 9 years ago
Target-decoy doesn't help in the old version either. And takes a very long time 
due to the increase in database size...

However, I am able to open your files both with and without the decoy section. 
I do get an error (related to the validation plots, which will be empty), but 
if I close that dialog I can interact with the data. Is this the case for you 
as well?

Could you send me the PeptideShaker.log file in your conf folder?

Don't know why this is a problem for OMSSA files and not for Mascot files 
though. But as I mentioned above this has been fixed in the new version of 
PeptideShaker.

So I'll wait until that's available for you to test. Hopefully later this week, 
or early next week.

Original comment by harald.b...@gmail.com on 12 Oct 2011 at 11:52

GoogleCodeExporter commented 9 years ago
ok, In attachement you'll find the peptideShaker.log file

Original comment by lia.s...@gmail.com on 12 Oct 2011 at 12:00

Attachments:

GoogleCodeExporter commented 9 years ago
Lots of errors there... Try deleting the log file and re-run PeptideShaker on 
the files causing you issues. Then send me the new log file.

It also seems to run out of memory. You could try to increase the max memory 
settings for PeptideShaker as well. This is done in the file 'JavaOptions.txt' 
in the conf folder. Increase the -Xmx1500M to -Xmx2500M if you have enough 
memory for that.

All of this will be simpler in the new version which requires less memory.

Original comment by harald.b...@gmail.com on 12 Oct 2011 at 12:13

GoogleCodeExporter commented 9 years ago
here the file

Original comment by lia.s...@gmail.com on 12 Oct 2011 at 12:32

Attachments:

GoogleCodeExporter commented 9 years ago
This seems to be the exact same log file? Did you delete it an re-run 
PeptideShaker?

Original comment by harald.b...@gmail.com on 12 Oct 2011 at 12:37

GoogleCodeExporter commented 9 years ago
sorry, here the file

Original comment by lia.s...@gmail.com on 12 Oct 2011 at 12:53

Attachments:

GoogleCodeExporter commented 9 years ago
There are no errors in this log file. So what extually happens when you try to 
open your files?

Original comment by harald.b...@gmail.com on 12 Oct 2011 at 12:58

GoogleCodeExporter commented 9 years ago
when I try to open my file, the tool bug and I have to close it 

Original comment by lia.s...@gmail.com on 12 Oct 2011 at 1:00

GoogleCodeExporter commented 9 years ago
You mean that it freezes and that you can no longer interact with it? If so 
could you send me a screenshot of the tool when this happens?

This is also most likely related to memory issues. And you could try increasing 
the memory settings as I explained above and see if that helps. Or did you 
already try this?

Original comment by harald.b...@gmail.com on 12 Oct 2011 at 1:03

GoogleCodeExporter commented 9 years ago
Yes, iT freezes, the programm is not responding, I try to increase to -Xmx2500M 
but I can't open the tool.

Original comment by lia.s...@gmail.com on 12 Oct 2011 at 1:12

Attachments:

GoogleCodeExporter commented 9 years ago
And the same thing does not happen when using a Mascot dat file? Strange.

Anyway, let's just wait until the new PeptideShaker version is released.

The new version is also more memory efficient, so it shouldn't matter that you 
cannot set the memory to 2.5 GB.

Original comment by harald.b...@gmail.com on 12 Oct 2011 at 2:36

GoogleCodeExporter commented 9 years ago
yes, when I use a mascot dat files there's no problem

Original comment by lia.s...@gmail.com on 13 Oct 2011 at 12:19

GoogleCodeExporter commented 9 years ago
PeptideShaker v0.10.0 has just been released. Please let us know if this solves 
your issues or not.

Original comment by harald.b...@gmail.com on 19 Oct 2011 at 3:28

GoogleCodeExporter commented 9 years ago
Hi, I try with my omx file wich are come from MSDA tool: 
here the result

Fri Oct 21 15:01:54 CEST 2011        Importing sequences from 
uniprot_sprot_2011_08.fasta.
Fri Oct 21 15:02:11 CEST 2011        FASTA file import completed.
Fri Oct 21 15:02:11 CEST 2011        Reading identification files.
Fri Oct 21 15:02:11 CEST 2011        Reading file: 
OlLA110504_albu-B-mod_264_1-2.omx
Fri Oct 21 15:02:17 CEST 2011        Identification file(s) import completed. 
557 identifications imported, 95 identifications retained.
Fri Oct 21 15:02:17 CEST 2011        Computing assumptions probabilities.
Fri Oct 21 15:02:17 CEST 2011        Adding assumptions probabilities.
Fri Oct 21 15:02:17 CEST 2011        Selecting best hit per spectrum.
Fri Oct 21 15:02:17 CEST 2011        Generating PSM map.
Fri Oct 21 15:02:17 CEST 2011        Computing PSM probabilities.
Fri Oct 21 15:02:17 CEST 2011        Computing peptide probabilities.
Fri Oct 21 15:02:17 CEST 2011        Scoring PTMs.
Fri Oct 21 15:02:17 CEST 2011        An error occurred while working on the 
identification. See the log file for more details.
Fri Oct 21 15:02:17 CEST 2011        Trying to resolve protein inference issues.
Fri Oct 21 15:02:17 CEST 2011        An error occured while loading the 
identification files:
Fri Oct 21 15:02:17 CEST 2011        null

Fri Oct 21 15:02:17 CEST 2011        Import canceled.

When I try, with the xml file from searchgui, there's no problem (X-tandem 
search), because It still not working with Omssa and I don't understand why.

With my mascot dat file, it amazing, it's not working at all

Original comment by lia.s...@gmail.com on 21 Oct 2011 at 1:16

GoogleCodeExporter commented 9 years ago
OK, so X!Tandem works, that's good.

Probably just some minor detail for the other two search engines.

Just detected a Mascot issue that might help you as well. Mascot peptides 
sometimes contains the non-standard amino acids B, Z and X. These were not 
supported in PeptideShaker, but have now been added.

But Mascot used to work for you before right? I did update the Mascot parser 
library, maybe that's it.

Could you send me your new log file (from the new PeptideShaker version)?

And are the input files (omx/dat, mgf and fasta) the same as before?

Original comment by harald.b...@gmail.com on 21 Oct 2011 at 2:28

GoogleCodeExporter commented 9 years ago
Yes I used the same omx/dat file. with the older peptide Shaker version, I 
don't have problem with my mascot dat file. 
In attachement you'll find my log file

Original comment by lia.s...@gmail.com on 21 Oct 2011 at 2:57

Attachments:

GoogleCodeExporter commented 9 years ago
I cannot find your dat file. Could you send that to me as well?

Original comment by harald.b...@gmail.com on 21 Oct 2011 at 3:10

GoogleCodeExporter commented 9 years ago
From your log file I see the following "Protein not found: IGHG3_HUMAN". This 
means that the protein was not found in your database. Which makes perfect 
sense given that "IGHG3_HUMAN" is not a protein accession number but rather a 
protein name. The accession number for this protein is "P01871". So this means 
that something went wrong in the database parsing. Does this happen for the omx 
or the dat file?

I also see some problems when trying to estimate a peptide's theoretic mass. 
This looks similar to the problem I mentioned above for the Mascot special 
amino acids B, Z and X. I've released a new version of PeptideShaker that 
supports B, Z and X. Maybe you could give that a try and see if you know can 
open your Mascot file again?

Original comment by harald.b...@gmail.com on 21 Oct 2011 at 3:44

GoogleCodeExporter commented 9 years ago
I try with the new version and no result, I have the same problem. but I don't 
understand why it ws working with the mascot .dat file and now it's not 
working. It still work with the file wich are come from searchgui, but no with 
my omx file.
In attachement you'll find my log file

Original comment by lia.s...@gmail.com on 24 Oct 2011 at 7:57

Attachments:

GoogleCodeExporter commented 9 years ago
The reason for the Mascot error has been detected: "Unknown amino acid: U!". In 
the new version we try to re-calculate the theoretical peptide mass (something 
we didn't do before), and the special amino acids in Mascot (B, Z, X and now U) 
results in issues. We'll fix this and release a new version later today.

So the omx file you're now using comes from MSDA and not from SearchGUI? Then 
there is still something wrong with the way the FASTA file is parsed in MSDA as 
"IGHG1_HUMAN" is not a protein accession number. But we'll look into it and 
perhaps contact the MSDA developers.

Original comment by harald.b...@gmail.com on 24 Oct 2011 at 8:15

GoogleCodeExporter commented 9 years ago
ok thanks

Original comment by lia.s...@gmail.com on 24 Oct 2011 at 8:45

GoogleCodeExporter commented 9 years ago
PeptideShaker version 0.10.3 has just been released. It supports the 
Selenocysteine amino acid (the U) causing issues the last time around. 
Hopefulle this means that Mascot should work again.

If not, please send me the dat file so that I can test it.

As for the omx file, the omx file you sent us earlier had other issues than 
what you now report, so if you could send the omx file you are using as well 
that would help a lot. (As it does not seem to be the exact same file as 
before..?)

Original comment by harald.b...@gmail.com on 24 Oct 2011 at 2:29

GoogleCodeExporter commented 9 years ago
in attachement you'll find my mascot .dat file

Original comment by lia.s...@gmail.com on 24 Oct 2011 at 2:56

Attachments:

GoogleCodeExporter commented 9 years ago
Does this mean that it is still not working with the Mascot file?

Original comment by harald.b...@gmail.com on 24 Oct 2011 at 2:58

GoogleCodeExporter commented 9 years ago
yes it is!!!

Original comment by lia.s...@gmail.com on 24 Oct 2011 at 2:59

GoogleCodeExporter commented 9 years ago
Ok, I can confirm that it is possible to open your dat file in the old version, 
but I do get a 'protein not found exception': "Protein not found! Accession: 
UBP29_HUMAN". So this means that the parsing of the FASTA files on your Mascot 
server is not set up correctly.

Which is the same problem you get with the new verison (with a different 
protein though): "Protein not found: IGLL5_HUMAN." The only difference that I 
can see is that the new version stops you from continuing (given that you have 
unknown proteins) where as the old version allowed you to continue anyway.

Please refer to http://www.matrixscience.com/help/seq_db_setup.html for 
database setup in Mascot. With the correct parsing rules you should be able to 
load your dat file into PeptideShaker without any issues.

When the parsing rules are set up correctly they should return only the protein 
accession number in the 'Accessions' column.

If you need help finding the correct parsing rules let me know and I'll send 
them to you. (Don't have them available right now.)

Original comment by harald.b...@gmail.com on 24 Oct 2011 at 3:18

GoogleCodeExporter commented 9 years ago
Regarding your omx file, have you remade this like suggested by the MSDA team? 
(see: http://code.google.com/p/peptide-shaker/issues/detail?id=2#c15)

If not you will still get the problem that the accession numbers in your omx 
file are like this 'sp|P02768' and not like this 'P02768' as they should be. 
And therefore not compatible with PeptideShaker.

Original comment by harald.b...@gmail.com on 24 Oct 2011 at 4:10

GoogleCodeExporter commented 9 years ago
As the issues seem to be related to the parsing of accession numbers on either 
the Mascot server or MSDA (used for the OMSSA search), and therefore does not 
require further changes to the PeptideShaker code, I'm setting this issue as 
Fixed.

Please see the Read Me (http://code.google.com/p/peptide-shaker/#Read_Me) or 
the new Database Help (http://code.google.com/p/searchgui/wiki/DatabaseHelp) 
for further details.

Original comment by harald.b...@gmail.com on 22 Dec 2011 at 1:59