compomics / searchgui

Highly adaptable common interface for proteomics search and de novo engines
http://compomics.github.io/projects/searchgui.html
42 stars 15 forks source link

Error loading ID files into peptide shaker #298

Closed ttessie2 closed 3 years ago

ttessie2 commented 3 years ago

I'm just testing different search parameters with MS-GF+ in searchGUI however, I get the above error while trying to create a project in peptide shaker. I did not select the run peptideshaker option in searchGUI after running MS-GF+. I've chosen the correct .mzid files and database. Any help would be great. The error says: An error occurred while loading the identification files index 12534 out of bounds for length 12534

hbarsnes commented 3 years ago

This looks similar to an issue we've seen before which was related to non-unique FASTA accession numbers. Which version of SearchGUI are you using? I would recommend updating to the latest SearchGUI version and re-index the FASTA file. You should then get a warning if the file contains non-unique accession numbers. If you can share the FASTA file I can also take a look.

It would be great if you can also share the PeptideShaker error log. You can find it via the Welcome dialog > Settings & Help > Help > Bug Report.

ttessie2 commented 3 years ago

I won't be at my desktop until later this evening so I can't send you anything at the moment. However, when I was running ms-gf my searchgui and peptide shaker were the most up-to-date versions. This evening I will repeat everything from the beginning and take more detailed notes. But off the top of my head, which database file is the correct one to use when starting a peptide shaker project? I only tried with the fasta file generated by searchgui that is concatenated with the decoys.

hbarsnes commented 3 years ago

It depends more on when the FASTA file was indexed, as this test was only added (or rather re-added) in SearchGUI 4.0.25. If your FASTA file was indexed before that it will not be re-indexed. To get a proper test I would recommend that you rename your FASTA file (or just make a copy) and then re-add the decoys in the latest SearchGUI version.

But off the top of my head, which database file is the correct one to use when starting a peptide shaker project? I only tried with the fasta file generated by searchgui that is concatenated with the decoys.

Yes, that is the correct approach.

ttessie2 commented 3 years ago

I copied the fasta file into a new directory and repeated everything without any luck. Below is the report from peptide shaker

Thu Apr 15 22:49:25 EDT 2021: PeptideShaker version 2.0.19. Memory given to the Java virtual machine: 17179869184. Total amount of memory in the Java virtual machine: 134217728. Free memory: 87379832. Java version: 15.0.2. 1714 script command tokens (C) 2009 Jmol Development Jmol Version: 12.0.43 2011-05-03 14:21 java.vendor: Oracle Corporation java.version: 15.0.2 os.name: Windows 10 memory: 49.1/134.2 processors available: 24 useCommandThread: false

PeptideShaker processing failed. See the PeptideShaker log for details.

java.lang.ArrayIndexOutOfBoundsException: Index 12534 out of bounds for length 12534 at com.compomics.util.experiment.identification.protein_inference.fm_index.FMIndex.recursiveMassFilling(FMIndex.java:1579) at com.compomics.util.experiment.identification.protein_inference.fm_index.FMIndex.recursiveMassFilling(FMIndex.java:1605) at com.compomics.util.experiment.identification.protein_inference.fm_index.FMIndex.recursiveMassFilling(FMIndex.java:1605) at com.compomics.util.experiment.identification.protein_inference.fm_index.FMIndex.recursiveMassFilling(FMIndex.java:1605) at com.compomics.util.experiment.identification.protein_inference.fm_index.FMIndex.recursiveMassFilling(FMIndex.java:1605) at com.compomics.util.experiment.identification.protein_inference.fm_index.FMIndex.recursiveMassFilling(FMIndex.java:1605) at com.compomics.util.experiment.identification.protein_inference.fm_index.FMIndex.recursiveMassFilling(FMIndex.java:1605) at com.compomics.util.experiment.identification.protein_inference.fm_index.FMIndex.init(FMIndex.java:1218) at com.compomics.util.experiment.identification.protein_inference.fm_index.FMIndex.(FMIndex.java:581) at eu.isas.peptideshaker.fileimport.FileImporter.importSequences(FileImporter.java:957) at eu.isas.peptideshaker.fileimport.FileImporter.importFiles(FileImporter.java:219) at eu.isas.peptideshaker.PeptideShaker.importFiles(PeptideShaker.java:219) at eu.isas.peptideshaker.gui.NewDialog$20.run(NewDialog.java:736) at java.base/java.lang.Thread.run(Thread.java:832) Free memory: 248209744

Thu Apr 15 23:13:30 EDT 2021: PeptideShaker version 2.0.19. Memory given to the Java virtual machine: 17179869184. Total amount of memory in the Java virtual machine: 134217728. Free memory: 121514768. Java version: 15.0.2.

Thu Apr 15 23:18:42 EDT 2021: PeptideShaker version 2.0.19. Memory given to the Java virtual machine: 17179869184. Total amount of memory in the Java virtual machine: 134217728. Free memory: 87284176. Java version: 15.0.2. 1714 script command tokens (C) 2009 Jmol Development Jmol Version: 12.0.43 2011-05-03 14:21 java.vendor: Oracle Corporation java.version: 15.0.2 os.name: Windows 10 memory: 50.5/134.2 processors available: 24 useCommandThread: false

hbarsnes commented 3 years ago

Could you try renaming the FASTA file instead?

ttessie2 commented 3 years ago

I tried renaming the fasta file as well as downloading a new fasta file of the human proteome but I'm still getting the same error. I then tried using IDPicker with the mzid file and didn't have any issues opening it. I'm not exactly sure what to do next.

hbarsnes commented 3 years ago

Would it be possible for you to share the data with me so that I can try to reproduce it on my side?

ttessie2 commented 3 years ago

I can upload those when I get back to my desktop at home. In the meantime I have downloaded SearchGUI onto a lab computer to see if I get the same error. Interestingly, I'm getting a different issue that has to due with importing the protein database. I get this error when I'm entering the input information into SearchGUI. When I upload the fasta file and get the message "the database does not seem to contain decoy sequences. Add decoys?" and I say yes, I get a "FASTA import error" telling me that the fasta file cannot be found.

ttessie2 commented 3 years ago

Nevermind, ignore that last message please. That was due to admin privileges on this computer I moved the files elsewhere and there isn't an issue loading the database file.

ttessie2 commented 3 years ago

Update: I ran searchGUI/MSGF here (on a different desktop) and tried opening it in peptideshaker and had the same problem. Here is a link for the database file and raw spectra. https://drive.google.com/drive/folders/1m9WhsM-hbcb4N26poRwLj5cVX3DgAblg?usp=sharing

hbarsnes commented 3 years ago

Thanks for sharing the data! I will process the data and see if I can reproduce the issue.

BTW, I see from your search settings that you have set the fragment mass tolerance to 2.5 Da? This seems very high? Are you sure that this is correct?

ttessie2 commented 3 years ago

That should be 0.5 not 2.5. Thanks for the catch!

hbarsnes commented 3 years ago

Can you try again with 0.5 and see if that solves the problem?

ttessie2 commented 3 years ago

Oh wow, yea that solved it! What a stupid mistake, thanks for pointing that out!