Loading of search results fails

GoogleCodeExporter commented 9 years ago

> What steps will reproduce the problem?

Import a search-GUI result (the issue happens for both omx and t.xml result 
files). Hard to attach, since they are at least 100 MB.

> What is the expected output? What do you see instead?

While importing, the error message "No amino acid found for letter >."
pops up. Log below.

> What version of the product are you using? On what operating system?

PeptideShaker-0.26.2 (Win64)

> Please provide any additional information below.

> If the reported issue resulted in the tool crashing, please
> also upload the file called PeptideShaker.log (found in the
> PeptideShaker-X.Y.Z\resources folder).

Tue Mar 18 11:33:40 CET 2014: PeptideShaker version 0.26.2.
Memory given to the Java virtual machine: 7546077184.
Total amount of memory in the Java virtual machine: 129499136.
Free memory: 116025976.
Java version: 1.7.0_45.
1714 script command tokens
(C) 2009 Jmol Development
Jmol Version: 12.0.43  2011-05-03 14:21
java.vendor: Oracle Corporation
java.version: 1.7.0_45
os.name: Windows 7
memory: 34.5/129.5
processors available: 16
useCommandThread: false
<CompomicsError>PeptideShaker processing failed. See the PeptideShaker log for 
details.</CompomicsError>
java.lang.IllegalArgumentException: No amino acid found for letter >.
    at com.compomics.util.experiment.biology.AminoAcid.getAminoAcid(AminoAcid.java:200)
    at com.compomics.util.experiment.biology.Protein.computeMolecularWeight(Protein.java:207)
    at com.compomics.util.experiment.identification.SequenceFactory.computeMolecularWeight(SequenceFactory.java:1027)
    at eu.isas.peptideshaker.PeptideShaker.retainBestScoringGroups(PeptideShaker.java:3393)
    at eu.isas.peptideshaker.PeptideShaker.processIdentifications(PeptideShaker.java:384)
    at eu.isas.peptideshaker.fileimport.FileImporter$IdProcessorFromFile.importFiles(FileImporter.java:488)
    at eu.isas.peptideshaker.fileimport.FileImporter$IdProcessorFromFile.doInBackground(FileImporter.java:388)
    at javax.swing.SwingWorker$1.call(Unknown Source)
    at java.util.concurrent.FutureTask.run(Unknown Source)
    at javax.swing.SwingWorker.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
Free memory: 902156200

Original issue reported on code.google.com by i.really...@googlemail.com on 18 Mar 2014 at 10:57

GoogleCodeExporter commented 9 years ago

I cant find a sequence (<domain seq=".*" ... > which contains a ">" or is empty.
Havent looked at source code though... error message might be misleading 
though....

Original comment by i.really...@googlemail.com on 18 Mar 2014 at 11:03

GoogleCodeExporter commented 9 years ago

I'm pretty sure this bug is related to this one 
http://code.google.com/p/searchgui/issues/detail?id=23. It would seem that a 
FASTA header is mistaken for a protein sequence. 

I've improved the error message so that it will now state: 
Error parsing the sequence of "your_accession". Protein sequence: 
"your_sequence".

But will also try to improve our FASTA file parsing so that this should not 
happen in future version. A new version of PeptideShaker will be available 
shortly.

Original comment by harald.b...@gmail.com on 18 Mar 2014 at 9:49

Changed state: Fixed

GoogleCodeExporter commented 9 years ago

this issue is independent of 
http://code.google.com/p/searchgui/issues/detail?id=23

I fixed the DB before running SearchGUI.
Peptideshaker will also go through a couple of steps before crashing:

Wed Mar 19 15:15:40 CET 2014        Importing sequences from 
all_concatenated_target_decoy.fasta. Wed Mar 19 15:17:18 CET 
2014        FASTA file import completed. Wed Mar 19 15:17:18 CET 
2014        Establishing local database connection. Wed Mar 19 15:17:24 
CET 2014        Reading identification files. Wed Mar 19 15:17:24 CET 
2014        Parsing Toni_20130527_AE_Exp87_sample11_01.t.xml. Wed Mar 
19 15:17:53 CET 2014        Importing 
Toni_20130527_AE_Exp87_sample11_01.mgf Wed Mar 19 15:17:53 CET 
2014        Toni_20130527_AE_Exp87_sample11_01.mgf imported. Wed Mar 19 
15:17:53 CET 2014        Importing PSMs from 
Toni_20130527_AE_Exp87_sample11_01.t.xml Wed Mar 19 15:26:01 CET 
2014        File import completed. 34968 first hits imported (284 
secondary) from 36556 spectra. Wed Mar 19 15:26:01 CET 
2014        [32563 first hits passed the initial filtering] Wed Mar 19 
15:26:01 CET 2014        Computing assumptions probabilities. Wed Mar 
19 15:26:01 CET 2014        Saving assumptions probabilities. Wed Mar 
19 15:26:01 CET 2014        Selecting best peptide per spectrum. Wed 
Mar 19 15:26:23 CET 2014        Computing PSM probabilities. Wed Mar 19 
15:26:23 CET 2014        Scoring PTMs in PSMs (D-score and A-score) Wed 
Mar 19 15:27:28 CET 2014        Thresholding PTM localizations. Wed Mar 
19 15:27:28 CET 2014        Resolving peptide inference issues. Wed Mar 
19 15:27:46 CET 2014        Saving probabilities, building peptides and 
proteins. Wed Mar 19 15:28:49 CET 2014        Simplifying protein 
groups. Wed Mar 19 15:29:04 CET 2014        481 unlikely mappings 
found. (82% non-enzymatic accessions, 17% lower evidence accessions, 1% not 
characterized accessions) Wed Mar 19 15:29:04 CET 
2014        Generating peptide map. Wed Mar 19 15:29:20 CET 
2014        Computing peptide probabilities. Wed Mar 19 15:29:20 CET 
2014        Saving peptide probabilities. Wed Mar 19 15:29:20 CET 
2014        Generating protein map. Wed Mar 19 15:29:21 CET 
2014        Resolving protein inference issues, inferring peptide and 
protein PI status.  Wed Mar 19 15:31:24 CET 2014        Importing Data 
Canceled! Wed Mar 19 15:31:24 CET 2014        An error occured while 
loading the identification files: Wed Mar 19 15:31:24 CET 
2014        No amino acid found for letter >.

Original comment by i.really...@googlemail.com on 19 Mar 2014 at 2:56

GoogleCodeExporter commented 9 years ago

Well, maybe the issues are not directly linked but the main problem seems to be 
the same. One of the proteins in your FASTA file does not have a protein 
sequence, hence the next line, starting with '>...' is assumed to be the 
protein sequence and PeptideShaker then fails trying to convert '>" into an 
amino acid.

But as stated above this should be fixed in the new upcoming versions of 
SearchGUI and PeptideShaker. In the meantime I'd recommend double checking your 
FASTA files for protein headers without sequence, and delete the related 
database index files (the .fasta.cui files stored next to the FASTA files). 
This should solve the problem. Let me know if this is not the case.

I will let you know when the new versions are available. Hopefully later this 
week or early next week.

Original comment by harald.b...@gmail.com on 19 Mar 2014 at 4:30

Immortalin / peptide-shaker

Loading of search results fails #44