Run fails even with Demo data

FabianRup commented 8 months ago

Dear Glyco-Decipher team,

unfortunately, a run fails with the following error messages even when using the Demo data:

2024-03-22 14:14:01 [Peptide Identification] Running MS-GF+ to Get PSM from De-Glyco Peak Spectrum(Several Minutes Required)
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!FAILED!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
java.lang.IllegalStateException: Index does not contain mzIdentML root!
    at uk.ac.ebi.jmzidml.xml.xxindex.MzIdentMLIndexerFactory$MzIdentMLIndexerImpl.<init>(MzIdentMLIndexerFactory.java:106)
    at uk.ac.ebi.jmzidml.xml.xxindex.MzIdentMLIndexerFactory$MzIdentMLIndexerImpl.<init>(MzIdentMLIndexerFactory.java:66)
    at uk.ac.ebi.jmzidml.xml.xxindex.MzIdentMLIndexerFactory.buildIndex(MzIdentMLIndexerFactory.java:63)
    at uk.ac.ebi.jmzidml.xml.xxindex.MzIdentMLIndexerFactory.buildIndex(MzIdentMLIndexerFactory.java:51)
    at uk.ac.ebi.jmzidml.xml.io.MzIdentMLUnmarshaller.<init>(MzIdentMLUnmarshaller.java:68)
    at edu.ucsd.msjava.mzid.MzIDParser.<init>(MzIDParser.java:32)
    at edu.ucsd.msjava.ui.MzIDToTsv.convert(MzIDToTsv.java:125)
    at edu.ucsd.msjava.ui.MzIDToTsv.main(MzIDToTsv.java:84)
    at cn.ac.dicp.group1809.utilities.msgfPlus_adapter.model.MzIDToTsv.run(MzIDToTsv.java:72)
    at cn.ac.dicp.group1809.research.glyco_decipher.gui.task.GlycoPeptideIdentificationTask.msgfPlusIdentification(GlycoPeptideIdentificationTask.java:370)
    at cn.ac.dicp.group1809.research.glyco_decipher.gui.task.GlycoPeptideIdentificationTask.call(GlycoPeptideIdentificationTask.java:128)
    at cn.ac.dicp.group1809.research.glyco_decipher.gui.task.GlycoPeptideIdentificationTask.call(GlycoPeptideIdentificationTask.java:54)
    at javafx.concurrent.Task$TaskCallable.call(Task.java:1426)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.lang.Thread.run(Thread.java:834)

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!FAILED!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!FAILED!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
java.lang.IllegalStateException: Index does not contain mzIdentML root!
    at uk.ac.ebi.jmzidml.xml.xxindex.MzIdentMLIndexerFactory$MzIdentMLIndexerImpl.<init>(MzIdentMLIndexerFactory.java:106)
    at uk.ac.ebi.jmzidml.xml.xxindex.MzIdentMLIndexerFactory$MzIdentMLIndexerImpl.<init>(MzIdentMLIndexerFactory.java:66)
    at uk.ac.ebi.jmzidml.xml.xxindex.MzIdentMLIndexerFactory.buildIndex(MzIdentMLIndexerFactory.java:63)
    at uk.ac.ebi.jmzidml.xml.xxindex.MzIdentMLIndexerFactory.buildIndex(MzIdentMLIndexerFactory.java:51)
    at uk.ac.ebi.jmzidml.xml.io.MzIdentMLUnmarshaller.<init>(MzIdentMLUnmarshaller.java:68)
    at edu.ucsd.msjava.mzid.MzIDParser.<init>(MzIDParser.java:32)
    at edu.ucsd.msjava.ui.MzIDToTsv.convert(MzIDToTsv.java:125)
    at edu.ucsd.msjava.ui.MzIDToTsv.main(MzIDToTsv.java:84)
    at cn.ac.dicp.group1809.utilities.msgfPlus_adapter.model.MzIDToTsv.run(MzIDToTsv.java:72)
    at cn.ac.dicp.group1809.research.glyco_decipher.gui.task.GlycoPeptideIdentificationTask.msgfPlusIdentification(GlycoPeptideIdentificationTask.java:370)
    at cn.ac.dicp.group1809.research.glyco_decipher.gui.task.GlycoPeptideIdentificationTask.call(GlycoPeptideIdentificationTask.java:128)
    at cn.ac.dicp.group1809.research.glyco_decipher.gui.task.GlycoPeptideIdentificationTask.call(GlycoPeptideIdentificationTask.java:54)
    at javafx.concurrent.Task$TaskCallable.call(Task.java:1426)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.lang.Thread.run(Thread.java:834)

DICP-1809 commented 8 months ago

We re-checked our software carefully with Demo data, and it worked very well in this step:

The error was reported during .mzid result file conversion. I found similar issue was reported in 2017 for jmzml package (https://github.com/PRIDE-Utilities/jmzml/issues/20), which was used in our software. But we have never met issue like this. Please:

check that you have downloaded the correct demo date file and closed any editing window when searching this file.
delete the "temp_demo" directory, and try to re-search the file.

FabianRup commented 8 months ago

Thank you, the DemoData now ran successfully. Using first real data, I get the following error:

2024-03-22 15:01:02 [Peptide Identification] Running MS-GF+ to Get PSM from De-Glyco Peak Spectrum(Several Minutes Required)
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!FAILED!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1
    at cn.ac.dicp.group1809.utilities.uniprot_reader.fasta.FastaParser.read(FastaParser.java:58)
    at cn.ac.dicp.group1809.research.glyco_decipher.utils.MotifContainingProtein.write(MotifContainingProtein.java:32)
    at cn.ac.dicp.group1809.research.glyco_decipher.gui.task.GlycoPeptideIdentificationTask.msgfPlusPreProcess(GlycoPeptideIdentificationTask.java:317)
    at cn.ac.dicp.group1809.research.glyco_decipher.gui.task.GlycoPeptideIdentificationTask.msgfPlusIdentification(GlycoPeptideIdentificationTask.java:358)
    at cn.ac.dicp.group1809.research.glyco_decipher.gui.task.GlycoPeptideIdentificationTask.call(GlycoPeptideIdentificationTask.java:128)
    at cn.ac.dicp.group1809.research.glyco_decipher.gui.task.GlycoPeptideIdentificationTask.call(GlycoPeptideIdentificationTask.java:54)
    at javafx.concurrent.Task$TaskCallable.call(Task.java:1426)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.lang.Thread.run(Thread.java:834)

DICP-1809 commented 8 months ago

The error was reported when reading your fasta protein database, and was caused by invalid protein header format. Now our software only recognizes UniProt Protein Header format (please reference https://www.uniprot.org/help/fasta-headers):

which contains three main parts split with "|": a|b|c in which a: database; b: protein accession; c: protein annotation. When reading your fasta, the software failed to get protein accession information of the second part. We provided a fasta file in demo, which is a protein database for mouse. You can:

reference this to change your own fasta protein header format (just place accession at the middle of "|" splited values of protein header);
or download a new standard fasta file from UniProt: https://www.uniprot.org/

FabianRup commented 8 months ago

Thank you so much for the quick answer, this was indeed the problem!

It would be nice in future releases if the fasta header wouldn't play such a big role. This is particularly interesting for people working on non-model species that don't have a proteome available in uniprot.

DICP-1809 / Glyco-Decipher

Run fails even with Demo data #5