Closed ohickl closed 4 years ago
Hi Oskar,
There was an undetected bug in the code trying to locate the FASTA file. Should be fixed. I'll let you know as soon as we manage to release an updated beta version.
Best regards. Harald
Ok, thanks!
Hi again,
You can find an updated beta release here: https://www.dropbox.com/s/us9y9sif3iakrxs/PeptideShaker-2.0.0-beta.zip?dl=0.
Please let me know if you still experience issues with the new version.
Best regards, Harald
It runs further now, but I get this: 20200221_ps_stdout.log Log: 20200221_PeptideShaker_part.log Its still running, but I suspect its stuck and not progressing anymore.
The 2D_10pH_G1_2_pH2_5.comet.pep.xml file seems to have formatting issues:
could not resolve entity named '_EAL_domains" num_tot_proteins="1" ...
Looks like an issue with one or more of your protein accession numbers or names/descriptions. Please make sure that accesion numbers, protein names, etc, do not contain quotation marks, which '_EAL_domains
seems to do, as this can often mess up the XML formatting. Or at least this seems to be the case for the Comet pep.xml files.
I couldn't find any quotation marks in that database but I will use a different one to try and eliminate faulty headers as a source of errors. I used a different data set just to see if I could get it to work. It runs further now but crashes while doing protein inference. Also the same data set loads without any problems on the same PeptideShaker version with the same parameters using the gui on mac (unfortunately I get an error while loading the saved psdb file --> 20200224_PeptideShaker_gui_mac_part.log). CLI command:
java -Xmx112G -cp /mnt/lscratch/users/ohickl/proteomics/engines/PeptideShaker-2.0.0-beta/PeptideShaker-2.0.0-beta.jar eu.isas.peptideshaker.cmd.PeptideShakerCLI \
-reference "must_m5-1_v1" \
-identification_files "/mnt/lscratch/users/ohickl/proteomics/MuSt/m5-1_v1/spec_db_out/searchgui/m5_1-v1_sg_out_test_2020-02-24_18.00.15.zip" \
-out "/mnt/lscratch/users/ohickl/proteomics/MuSt/m5-1_v1/spec_db_out/searchgui/must_m5-1_v1_test.cpsx" \
# -zip "/mnt/lscratch/users/ohickl/proteomics/MuSt/m5-1_v1/spec_db_out/searchgui/must_m5-1_v1_test.cpsx.zip" \
-threads 28 >> /mnt/lscratch/users/ohickl/proteomics/MuSt/m5-1_v1/spec_db_out/searchgui/sgui_stdout.txt 2>&1
Mon Feb 24 18:05:39 CET 2020: PeptideShaker version 2.0.0-beta.
Memory given to the Java virtual machine: 106897080320.
Total amount of memory in the Java virtual machine: 2024275968.
Free memory: 1971427616.
Java version: 1.8.0_162.
java.lang.StringIndexOutOfBoundsException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:598)
at java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:677)
at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:735)
at java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at eu.isas.peptideshaker.protein_inference.ProteinInference.inferPiStatus(ProteinInference.java:114)
at eu.isas.peptideshaker.PeptideShaker.createProject(PeptideShaker.java:503)
at eu.isas.peptideshaker.cmd.PeptideShakerCLI.createProject(PeptideShakerCLI.java:812)
at eu.isas.peptideshaker.cmd.PeptideShakerCLI.call(PeptideShakerCLI.java:179)
at eu.isas.peptideshaker.cmd.PeptideShakerCLI.main(PeptideShakerCLI.java:1014)
Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: 812
at java.lang.String.charAt(String.java:658)
at com.compomics.util.experiment.identification.utils.PeptideUtils.getNEnzymaticTermini(PeptideUtils.java:464)
at com.compomics.util.experiment.identification.utils.PeptideUtils.lambda$null$16(PeptideUtils.java:523)
at java.util.stream.MatchOps$2MatchSink.accept(MatchOps.java:119)
at java.util.Spliterators$IntArraySpliterator.tryAdvance(Spliterators.java:1041)
at java.util.stream.IntPipeline.forEachWithCancel(IntPipeline.java:162)
at java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:498)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:485)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:230)
at java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:196)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.IntPipeline.anyMatch(IntPipeline.java:477)
at com.compomics.util.experiment.identification.utils.PeptideUtils.lambda$isEnzymatic$17(PeptideUtils.java:522)
at java.util.stream.MatchOps$1MatchSink.accept(MatchOps.java:90)
at java.util.ArrayList$ArrayListSpliterator.tryAdvance(ArrayList.java:1359)
at java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126)
at java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:498)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:485)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:230)
at java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:196)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.anyMatch(ReferencePipeline.java:449)
at com.compomics.util.experiment.identification.utils.PeptideUtils.isEnzymatic(PeptideUtils.java:520)
at eu.isas.peptideshaker.protein_inference.ProteinInference.lambda$compareMainProtein$4(ProteinInference.java:394)
at java.util.stream.MatchOps$1MatchSink.accept(MatchOps.java:90)
at java.util.stream.LongPipeline$3$1.accept(LongPipeline.java:231)
at java.util.Spliterators$LongArraySpliterator.tryAdvance(Spliterators.java:1124)
at java.util.stream.LongPipeline.forEachWithCancel(LongPipeline.java:160)
at java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:498)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:485)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:230)
at java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:196)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.anyMatch(ReferencePipeline.java:449)
at eu.isas.peptideshaker.protein_inference.ProteinInference.compareMainProtein(ProteinInference.java:393)
at eu.isas.peptideshaker.protein_inference.ProteinInference.inferPiStatus(ProteinInference.java:175)
at eu.isas.peptideshaker.protein_inference.ProteinInference.lambda$inferPiStatus$1(ProteinInference.java:115)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1553)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291)
at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.execLocalTasks(ForkJoinPool.java:1040)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1058)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
I couldn't find any quotation marks in that database.
Not even if you search for something including _EAL_domains
?
Also the same data set loads without any problems on the same PeptideShaker version with the same parameters using the gui on mac (unfortunately I get an error while loading the saved psdb file.
I'm afraid this is a known issue we are already working on: https://github.com/compomics/peptide-shaker-2.0-issue-tracker/issues/47.
I used a different data set just to see if I could get it to work. It runs further now but crashes while doing protein inference.
That is strange. Which enzyme are you using? And would it be possible for you to share this dataset with me so that I can try to reproduce the issue?
Not even if you search for something including _EAL_domains?
No, they all have the format ..._(GGDEF_&_EALdomains)\...
That is strange. Which enzyme are you using?
Trypsin.
And would it be possible for you to share this dataset with me so that I can try to reproduce the issue?
Sure, where should I send the link to?
No, they all have the format ...(GGDEF&_EALdomains)...
Perhaps the & is a problem?
Sure, where should I send the link to?
You can send it to me at harald.barsnes@gmail.com.
Thanks for sharing the files. However, it's probably easier if you share the zip file from SearchGUI as well? That way I will not have to redo the search and will be guaranteed to be loading the exact same data.
Ah, yes. I included it in the shared folder.
Ah, yes. I included it in the shared folder.
Thanks. I can confirm that I'm able to reproduce the issue. Seems to be due to problems parsing the custom headers correctly. I would recommend using our non-standard header format when having custom headers: https://github.com/compomics/searchgui/wiki/DatabaseHelp#non-standard-fasta.
I did use the generic format for a part of the sequences, others had uniprot headers.
Tot be sure I converted all headers to split generic format e.g.:
>generic|AMYS_HUMAN|contaminant
>generic|AMYS_HUMAN_REVERSED|contaminant_REVERSED
>generic|10000127|G_contig_gene82428_1
>generic|10000127_REVERSED|G_contig_gene82428_1_REVERSED
>generic|A0A0D9ZLB3|A0A0D9ZLB3_9ORYZ_Uncharacterized_protein_OS_Oryza_glumipatula_OX_40148_PE_4_SV_1
>generic|A0A0D9ZLB3_REVERSED|A0A0D9ZLB3_9ORYZ_Uncharacterized_protein_OS_Oryza_glumipatula_OX_40148_PE_4_SV_1_REVERSED
and the simple generic format e.g.:
>generic|AMYS_HUMAN
>generic|AMYS_HUMAN_REVERSED
>generic|10000127
>generic|10000127_REVERSED
>generic|A0A0D9ZLB3
>generic|A0A0D9ZLB3_REVERSED
Use of both resulted in the same error as before.
And should it not also fail, when using the gui if the headers where faulty? It ran fine, though.
Also when using the simple generic header db with FastaCLI it appends _REVERSED
twice.
Use of both resulted in the same error as before.
Can you share the new search results as well?
I added them to the shared folder and sent you the link.
I found the problem. It was actually the database. There where some sequences that had an identical id. Apologies, for the mess. It all works fine now. The only thing that is still happening is printing ridiculously high progress numbers when unzipping the SearchGUI zipped input. Also ReportCLI still produces the error from issue #47 I think.
I found the problem. It was actually the database. There where some sequences that had an identical id. Apologies, for the mess.
No worries. Good to hear that you were able to figure it out in the end.
The only thing that is still happening is printing ridiculously high progress numbers when unzipping the SearchGUI zipped input.
Yes, we know about this one, but perhaps you can create a new issue just so that we do not forget about it? :)
Also ReportCLI still produces the error from issue 47 I think.
Yes, this one is still unsolved. Seems like we have to yet again change the database to get it to work properly...
I will then close this specific issue, but do not hesitate to open a new one if you come across other issues.
Hi again,
when trying to load my SearchGUI results using:
I get the following error:
The same happens after unzipping the SearchGUI output, if i only specify the archive. Seems like its something with the database? I changed the headers to the generic format and added the decoys with the FastaCLI module.
Best
Oskar