compomics / peptide-shaker-2.0-issue-tracker

Issue tracker for the beta release of PeptideShaker 2.0
Apache License 2.0
0 stars 0 forks source link

Null pointer when trying to run PeptideShakerCLI #51

Closed ohickl closed 4 years ago

ohickl commented 4 years ago

Hi again,

when trying to load my SearchGUI results using:

java -Xmx112G -cp /mnt/lscratch/users/ohickl/proteomics/engines/PeptideShaker-2.0.0-beta/PeptideShaker-2.0.0-beta.jar eu.isas.peptideshaker.cmd.PeptideShakerCLI \
    -reference mock_test_01 \
    -fasta_file /mnt/lscratch/users/ohickl/proteomics/engine_test/searchgui/searchgui_out_2020-02-14_06.31.29/data/Mock_Comm_RefDB_V3_sg_fix_rev.fasta \
    -identification_files /mnt/lscratch/users/ohickl/proteomics/engine_test/searchgui/searchgui_out_2020-02-14_06.31.29 \
    -spectrum_files /mnt/lscratch/users/ohickl/proteomics/engine_test/searchgui/searchgui_out_2020-02-14_06.31.29/data \
    -id_params /mnt/lscratch/users/ohickl/proteomics/engine_test/searchgui/searchgui_out_2020-02-14_06.31.29/mock_test.par \
    -out /mnt/lscratch/users/ohickl/proteomics/engine_test/searchgui/mock_test_01.cpsx \
    -threads 28

I get the following error:

Fri Feb 21 15:18:06 CET 2020: PeptideShaker version 2.0.0-beta.
Memory given to the Java virtual machine: 106897080320.
Total amount of memory in the Java virtual machine: 2024275968.
Free memory: 1981997288.
Java version: 1.8.0_162.
java.lang.NullPointerException
        at eu.isas.peptideshaker.utils.CpsParent.loadFastaFile(CpsParent.java:409)
        at eu.isas.peptideshaker.cmd.PeptideShakerCLI.createProject(PeptideShakerCLI.java:746)
        at eu.isas.peptideshaker.cmd.PeptideShakerCLI.call(PeptideShakerCLI.java:179)
        at eu.isas.peptideshaker.cmd.PeptideShakerCLI.main(PeptideShakerCLI.java:1003)

The same happens after unzipping the SearchGUI output, if i only specify the archive. Seems like its something with the database? I changed the headers to the generic format and added the decoys with the FastaCLI module.

Best

Oskar

hbarsnes commented 4 years ago

Hi Oskar,

There was an undetected bug in the code trying to locate the FASTA file. Should be fixed. I'll let you know as soon as we manage to release an updated beta version.

Best regards. Harald

ohickl commented 4 years ago

Ok, thanks!

hbarsnes commented 4 years ago

Hi again,

You can find an updated beta release here: https://www.dropbox.com/s/us9y9sif3iakrxs/PeptideShaker-2.0.0-beta.zip?dl=0.

Please let me know if you still experience issues with the new version.

Best regards, Harald

ohickl commented 4 years ago

It runs further now, but I get this: 20200221_ps_stdout.log Log: 20200221_PeptideShaker_part.log Its still running, but I suspect its stuck and not progressing anymore.

hbarsnes commented 4 years ago

The 2D_10pH_G1_2_pH2_5.comet.pep.xml file seems to have formatting issues:

could not resolve entity named '_EAL_domains" num_tot_proteins="1" ...

Looks like an issue with one or more of your protein accession numbers or names/descriptions. Please make sure that accesion numbers, protein names, etc, do not contain quotation marks, which '_EAL_domains seems to do, as this can often mess up the XML formatting. Or at least this seems to be the case for the Comet pep.xml files.

ohickl commented 4 years ago

I couldn't find any quotation marks in that database but I will use a different one to try and eliminate faulty headers as a source of errors. I used a different data set just to see if I could get it to work. It runs further now but crashes while doing protein inference. Also the same data set loads without any problems on the same PeptideShaker version with the same parameters using the gui on mac (unfortunately I get an error while loading the saved psdb file --> 20200224_PeptideShaker_gui_mac_part.log). CLI command:

java -Xmx112G -cp /mnt/lscratch/users/ohickl/proteomics/engines/PeptideShaker-2.0.0-beta/PeptideShaker-2.0.0-beta.jar eu.isas.peptideshaker.cmd.PeptideShakerCLI \
    -reference "must_m5-1_v1" \
    -identification_files "/mnt/lscratch/users/ohickl/proteomics/MuSt/m5-1_v1/spec_db_out/searchgui/m5_1-v1_sg_out_test_2020-02-24_18.00.15.zip" \
    -out "/mnt/lscratch/users/ohickl/proteomics/MuSt/m5-1_v1/spec_db_out/searchgui/must_m5-1_v1_test.cpsx" \
#    -zip "/mnt/lscratch/users/ohickl/proteomics/MuSt/m5-1_v1/spec_db_out/searchgui/must_m5-1_v1_test.cpsx.zip" \
    -threads 28 >> /mnt/lscratch/users/ohickl/proteomics/MuSt/m5-1_v1/spec_db_out/searchgui/sgui_stdout.txt 2>&1

20200224_ps_stdout.log Log:

Mon Feb 24 18:05:39 CET 2020: PeptideShaker version 2.0.0-beta.
Memory given to the Java virtual machine: 106897080320.
Total amount of memory in the Java virtual machine: 2024275968.
Free memory: 1971427616.
Java version: 1.8.0_162.
java.lang.StringIndexOutOfBoundsException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:598)
    at java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:677)
    at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:735)
    at java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
    at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
    at eu.isas.peptideshaker.protein_inference.ProteinInference.inferPiStatus(ProteinInference.java:114)
    at eu.isas.peptideshaker.PeptideShaker.createProject(PeptideShaker.java:503)
    at eu.isas.peptideshaker.cmd.PeptideShakerCLI.createProject(PeptideShakerCLI.java:812)
    at eu.isas.peptideshaker.cmd.PeptideShakerCLI.call(PeptideShakerCLI.java:179)
    at eu.isas.peptideshaker.cmd.PeptideShakerCLI.main(PeptideShakerCLI.java:1014)
Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: 812
    at java.lang.String.charAt(String.java:658)
    at com.compomics.util.experiment.identification.utils.PeptideUtils.getNEnzymaticTermini(PeptideUtils.java:464)
    at com.compomics.util.experiment.identification.utils.PeptideUtils.lambda$null$16(PeptideUtils.java:523)
    at java.util.stream.MatchOps$2MatchSink.accept(MatchOps.java:119)
    at java.util.Spliterators$IntArraySpliterator.tryAdvance(Spliterators.java:1041)
    at java.util.stream.IntPipeline.forEachWithCancel(IntPipeline.java:162)
    at java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:498)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:485)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:230)
    at java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:196)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.IntPipeline.anyMatch(IntPipeline.java:477)
    at com.compomics.util.experiment.identification.utils.PeptideUtils.lambda$isEnzymatic$17(PeptideUtils.java:522)
    at java.util.stream.MatchOps$1MatchSink.accept(MatchOps.java:90)
    at java.util.ArrayList$ArrayListSpliterator.tryAdvance(ArrayList.java:1359)
    at java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126)
    at java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:498)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:485)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:230)
    at java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:196)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.anyMatch(ReferencePipeline.java:449)
    at com.compomics.util.experiment.identification.utils.PeptideUtils.isEnzymatic(PeptideUtils.java:520)
    at eu.isas.peptideshaker.protein_inference.ProteinInference.lambda$compareMainProtein$4(ProteinInference.java:394)
    at java.util.stream.MatchOps$1MatchSink.accept(MatchOps.java:90)
    at java.util.stream.LongPipeline$3$1.accept(LongPipeline.java:231)
    at java.util.Spliterators$LongArraySpliterator.tryAdvance(Spliterators.java:1124)
    at java.util.stream.LongPipeline.forEachWithCancel(LongPipeline.java:160)
    at java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:498)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:485)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:230)
    at java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:196)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.anyMatch(ReferencePipeline.java:449)
    at eu.isas.peptideshaker.protein_inference.ProteinInference.compareMainProtein(ProteinInference.java:393)
    at eu.isas.peptideshaker.protein_inference.ProteinInference.inferPiStatus(ProteinInference.java:175)
    at eu.isas.peptideshaker.protein_inference.ProteinInference.lambda$inferPiStatus$1(ProteinInference.java:115)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1553)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291)
    at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
    at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
    at java.util.concurrent.ForkJoinPool$WorkQueue.execLocalTasks(ForkJoinPool.java:1040)
    at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1058)
    at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
    at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
hbarsnes commented 4 years ago

I couldn't find any quotation marks in that database.

Not even if you search for something including _EAL_domains?

Also the same data set loads without any problems on the same PeptideShaker version with the same parameters using the gui on mac (unfortunately I get an error while loading the saved psdb file.

I'm afraid this is a known issue we are already working on: https://github.com/compomics/peptide-shaker-2.0-issue-tracker/issues/47.

I used a different data set just to see if I could get it to work. It runs further now but crashes while doing protein inference.

That is strange. Which enzyme are you using? And would it be possible for you to share this dataset with me so that I can try to reproduce the issue?

ohickl commented 4 years ago

Not even if you search for something including _EAL_domains?

No, they all have the format ..._(GGDEF_&_EALdomains)\...

That is strange. Which enzyme are you using?

Trypsin.

And would it be possible for you to share this dataset with me so that I can try to reproduce the issue?

Sure, where should I send the link to?

hbarsnes commented 4 years ago

No, they all have the format ...(GGDEF&_EALdomains)...

Perhaps the & is a problem?

Sure, where should I send the link to?

You can send it to me at harald.barsnes@gmail.com.

hbarsnes commented 4 years ago

Thanks for sharing the files. However, it's probably easier if you share the zip file from SearchGUI as well? That way I will not have to redo the search and will be guaranteed to be loading the exact same data.

ohickl commented 4 years ago

Ah, yes. I included it in the shared folder.

hbarsnes commented 4 years ago

Ah, yes. I included it in the shared folder.

Thanks. I can confirm that I'm able to reproduce the issue. Seems to be due to problems parsing the custom headers correctly. I would recommend using our non-standard header format when having custom headers: https://github.com/compomics/searchgui/wiki/DatabaseHelp#non-standard-fasta.

ohickl commented 4 years ago

I did use the generic format for a part of the sequences, others had uniprot headers.

Tot be sure I converted all headers to split generic format e.g.:

>generic|AMYS_HUMAN|contaminant
>generic|AMYS_HUMAN_REVERSED|contaminant_REVERSED
>generic|10000127|G_contig_gene82428_1
>generic|10000127_REVERSED|G_contig_gene82428_1_REVERSED
>generic|A0A0D9ZLB3|A0A0D9ZLB3_9ORYZ_Uncharacterized_protein_OS_Oryza_glumipatula_OX_40148_PE_4_SV_1
>generic|A0A0D9ZLB3_REVERSED|A0A0D9ZLB3_9ORYZ_Uncharacterized_protein_OS_Oryza_glumipatula_OX_40148_PE_4_SV_1_REVERSED

and the simple generic format e.g.:

>generic|AMYS_HUMAN
>generic|AMYS_HUMAN_REVERSED
>generic|10000127
>generic|10000127_REVERSED
>generic|A0A0D9ZLB3
>generic|A0A0D9ZLB3_REVERSED

Use of both resulted in the same error as before. And should it not also fail, when using the gui if the headers where faulty? It ran fine, though. Also when using the simple generic header db with FastaCLI it appends _REVERSED twice.

hbarsnes commented 4 years ago

Use of both resulted in the same error as before.

Can you share the new search results as well?

ohickl commented 4 years ago

I added them to the shared folder and sent you the link.

ohickl commented 4 years ago

I found the problem. It was actually the database. There where some sequences that had an identical id. Apologies, for the mess. It all works fine now. The only thing that is still happening is printing ridiculously high progress numbers when unzipping the SearchGUI zipped input. Also ReportCLI still produces the error from issue #47 I think.

hbarsnes commented 4 years ago

I found the problem. It was actually the database. There where some sequences that had an identical id. Apologies, for the mess.

No worries. Good to hear that you were able to figure it out in the end.

The only thing that is still happening is printing ridiculously high progress numbers when unzipping the SearchGUI zipped input.

Yes, we know about this one, but perhaps you can create a new issue just so that we do not forget about it? :)

Also ReportCLI still produces the error from issue 47 I think.

Yes, this one is still unsolved. Seems like we have to yet again change the database to get it to work properly...

I will then close this specific issue, but do not hesitate to open a new one if you come across other issues.