Closed bernt-matthias closed 3 years ago
@dominik-kopczynski Can you please take a look at this one? I think we've seen something similar before?
@bernt-matthias Would it be possible for you to share the data so that we can try to reproduce the issue on our end?
Here you go http://139.18.2.180/~maze/searchgui_input.zip .. please ping me if you got it, then I can remove it there.
The executed command line is
mkdir output_reports &&
cwd=`pwd` &&
export HOME=$cwd &&
ln -s '/gpfs1/data/galaxy_server/galaxy/database/files/000/338/dataset_338492.dat' searchgui_input.zip &&
jar xvf searchgui_input.zip SEARCHGUI_IdentificationParameters.par &&
peptide-shaker -Djava.awt.headless=true eu.isas.peptideshaker.cmd.PeptideShakerCLI -gui 0 -temp_folder $cwd/PeptideShakerCLI -log $cwd/resources -reference 'Galaxy_Experiment_2021032415271616596042' -identification_files $cwd/searchgui_input.zip -id_params $cwd/SEARCHGUI_IdentificationParameters.par -threads "${GALAXY_SLOTS:-12}" -output_file $cwd/output.mzid -include_sequences 0 -contact_first_name "Proteomics" -contact_last_name "Galaxy" -contact_email "galaxyp@umn.edu" -contact_address "galaxyp@umn.edu" -organization_name "University of Minnesota" -organization_email "galaxyp@umn.edu" -organization_address "Minneapolis, MN 55455, Vereinigte Staaten" -out_reports $cwd/output_reports -reports 3,9,6
This produces also the error/warning from https://github.com/compomics/peptide-shaker/issues/448
Thanks for sharing the data. You can now remove it. I can also confirm that I've been able to reproduce the issue. I will look into it some more and get back to you.
BTW, are you aware that your FASTA file does not contain any decoys?
BTW, are you aware that your FASTA file does not contain any decoys?
Actually no .. I forwarded the info to the user.
Ok, so the problem is the non-standard FASTA headers. For example for a header such as ">A0A1Q3NBR6|unreviewed|Pyruvate" the accession number is assumed to be "unreviewed". Thus all headers of this type end up having the same accession number which later results in the above issue as we end up referring to the wrong protein sequence and get the StringIndexOutOfBoundsException.
The solution is to reformat the headers: https://github.com/compomics/searchgui/wiki/DatabaseHelp#non-standard-fasta.
Ah. Thanks for the info. Might be a good idea to check for this somehow.
Also we should improve the help section of the Galaxy wrapper :)
Might be a good idea to check for this somehow.
Yes, I agree. It also used to be checked, but seems to have been removed in the refactoring. I will see if I can re-add it in the next release.
The check for duplicate accession numbers in FASTA files has now been re-added in SearchGUI v4.0.25.
I have the following in my log file (PS 2.0.15 via Galaxy)