compomics / peptide-shaker

Interpretation of proteomics identification results
http://compomics.github.io/projects/peptide-shaker.html
47 stars 19 forks source link

Database and memory issues #27

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
> What steps will reproduce the problem?
First, we donwloaded UniProtKB/SwissProt and uniProtKB/TrEMBL databases from 
UniProt download page in the decoy version. 
Then, since these database were not uploaded correctly by PeptideShaker, we 
tried to assemble a target-decoy DB using the Database Processing Tool, 
starting from the original UniProt DB versions (no decoy).

> What is the expected output? What do you see instead?
Only SwissProt DB obtained using the Database Processing Tool did work, while 
TrEMBL gave errors (see log file attached). In both cases, the original UniProt 
decoy DB files were not recognized correctly.
Moreover, when using the only functioning DB (SwissProt), an additional issue 
occurred (Out of Memory Error).
We tried to install both 32- and 64-bit versions of Java (also separately), as 
well as to extend the Java memory limit (up to 2 GB or more, as suggested) 
without success.
How can these issues be fixed?

> What version of the product are you using? On what operating system?
0.20.1 on a Windows 7 Professional 64-bit OS

> Please provide any additional information below.
The file extension of the decoy database downloaded from UniProt is .decoy, but 
this format is not recognized by PeptideShaker. We tried to modify the 
extension in .fasta either manually or using the Database Processing Tool but 
in both cases the database didn't work.

> If the reported issue resulted in the tool crashing, please
> also upload the file called PeptideShaker.log (found in the
> PeptideShaker-X.Y.Z\resources folder).

Original issue reported on code.google.com by alessand...@tiscali.it on 17 May 2013 at 2:28

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks for telling us about database issues.

First, I hope you didn't download the complete SwissProt and TrEMBL databases 
but only the proteins for the species you are working with? See 
http://code.google.com/p/searchgui/wiki/DatabaseHelp#UniProt_Databases for how 
to download the database for your species only.

Second, there is no need to download decoy sequences from UniProt as these can 
be created using SearchGUI (http://searchgui.googlecode.com). Simply upload the 
FASTA file in the Settings dialog and click the Decoy button. Reversed versions 
of all the sequences will then be added. You can then use this database in your 
SearchGUI or Mascot search.

I'm also not sure what the "Database Processing Tool" you refer to is. But 
unless advanced database processing is required, using SearchGUI would be the 
preferred option to make sure that the database is compatible with 
PeptideShaker.

When using Mascot please remember to _not_ use the Mascot decoy option, as this 
is adds random sequences and is not compatible with PeptideShaker. For more 
details on using Mascot data in PeptideShaker, see 
http://code.google.com/p/searchgui/wiki/DatabaseHelp#Mascot_Users. Note that 
using SearchGUI (and thus OMSSA and X!Tandem) instead of Mascot makes it a lot 
easier to get results that are compatible with PeptideShaker. And as using two 
search engines almost always gives better results than just one, this is 
normally the recommended approach.

As for the memory issues, could you check the file called 'startup.log' located 
in the folder resources\conf. This file should show you the actual command line 
executed and in it how much memory you gave it, e.g., -Xmx4000M for around 4 
GB. If you have more than 2 GB available I would strongly recommend to increase 
the value, as most modern proteomics datasets will benefit from this.

Let me know if you need more details.

Original comment by harald.b...@gmail.com on 17 May 2013 at 2:55

GoogleCodeExporter commented 9 years ago
(Issues assumed solved by the user.)

Original comment by harald.b...@gmail.com on 9 Mar 2014 at 11:54