compomics / peptide-shaker-2.0-issue-tracker

Issue tracker for the beta release of PeptideShaker 2.0
Apache License 2.0
0 stars 0 forks source link

Problem with dataset and Protein reports (normal and non-validated matches) #42

Closed CarlosHorro closed 4 years ago

CarlosHorro commented 5 years ago

Hi,

I'm having an exception with a SearchGUI file created with Yehia's worflow in Galaxy when loaded in new_backend PeptideShaker app and I try to generate a Default Protein report with or without non-validated-matches:

java.lang.StringIndexOutOfBoundsException: String index out of range: -1
    at java.lang.String.substring(String.java:1960)
    at com.compomics.util.experiment.identification.features.IdentificationFeaturesGenerator.lambda$getSitesSummary$42(IdentificationFeaturesGenerator.java:1792)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.TreeMap$EntrySpliterator.forEachRemaining(TreeMap.java:2969)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
    at com.compomics.util.experiment.identification.features.IdentificationFeaturesGenerator.getSitesSummary(IdentificationFeaturesGenerator.java:1797)
    at com.compomics.util.experiment.identification.features.IdentificationFeaturesGenerator.lambda$getAmbiguousModificationSites$40(IdentificationFeaturesGenerator.java:1775)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.TreeMap$KeySpliterator.forEachRemaining(TreeMap.java:2746)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
    at com.compomics.util.experiment.identification.features.IdentificationFeaturesGenerator.getAmbiguousModificationSites(IdentificationFeaturesGenerator.java:1776)
    at eu.isas.peptideshaker.export.sections.PsProteinSection.getFeature(PsProteinSection.java:394)
    at eu.isas.peptideshaker.export.sections.PsProteinSection.writeSection(PsProteinSection.java:209)
    at eu.isas.peptideshaker.export.PSExportFactory.writeExport(PSExportFactory.java:387)
    at eu.isas.peptideshaker.gui.export.FeaturesPreferencesDialog$13.run(FeaturesPreferencesDialog.java:553)

I also tried to reuse the old implementation of getSitesSummary instead of the new one:

StringBuilder result = new StringBuilder();
        ArrayList<Integer> representativeSites = new ArrayList<Integer>(sites.keySet());
        Collections.sort(representativeSites);
        boolean firstRepresentativeSite = true;
        for (int representativeSite : representativeSites) {
            if (firstRepresentativeSite) {
                firstRepresentativeSite = false;
            } else {
                result.append(", ");
            }
            char aa = sequence.charAt(representativeSite - 1);
            result.append(aa).append(representativeSite).append("-{");
            ArrayList<Integer> secondarySites = new ArrayList(sites.get(representativeSite));
            Collections.sort(secondarySites);
            boolean firstSecondarySite = true;
            for (Integer secondarySite : secondarySites) {
                if (firstSecondarySite) {
                    firstSecondarySite = false;
                } else {
                    result.append(" ");
                }
                aa = sequence.charAt(secondarySite - 1);
                result.append(aa).append(secondarySite);
            }
            result.append("}");
        }
        return result.toString(); 

But there is a very similar error with it too:

java.lang.StringIndexOutOfBoundsException: String index out of range: -1
    at java.lang.String.charAt(String.java:658)
    at com.compomics.util.experiment.identification.features.IdentificationFeaturesGenerator.getSitesSummary(IdentificationFeaturesGenerator.java:1809)
    at com.compomics.util.experiment.identification.features.IdentificationFeaturesGenerator.lambda$getAmbiguousModificationSites$40(IdentificationFeaturesGenerator.java:1775)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.TreeMap$KeySpliterator.forEachRemaining(TreeMap.java:2746)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
    at com.compomics.util.experiment.identification.features.IdentificationFeaturesGenerator.getAmbiguousModificationSites(IdentificationFeaturesGenerator.java:1776)
    at eu.isas.peptideshaker.export.sections.PsProteinSection.getFeature(PsProteinSection.java:394)
    at eu.isas.peptideshaker.export.sections.PsProteinSection.writeSection(PsProteinSection.java:209)
    at eu.isas.peptideshaker.export.PSExportFactory.writeExport(PSExportFactory.java:387)
    at eu.isas.peptideshaker.gui.export.FeaturesPreferencesDialog$13.run(FeaturesPreferencesDialog.java:553)
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
    at java.lang.String.charAt(String.java:658)
    at com.compomics.util.experiment.identification.features.IdentificationFeaturesGenerator.getSitesSummary(IdentificationFeaturesGenerator.java:1809)
    at com.compomics.util.experiment.identification.features.IdentificationFeaturesGenerator.lambda$getAmbiguousModificationSites$40(IdentificationFeaturesGenerator.java:1775)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.TreeMap$KeySpliterator.forEachRemaining(TreeMap.java:2746)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
    at com.compomics.util.experiment.identification.features.IdentificationFeaturesGenerator.getAmbiguousModificationSites(IdentificationFeaturesGenerator.java:1776)
    at eu.isas.peptideshaker.export.sections.PsProteinSection.getFeature(PsProteinSection.java:394)
    at eu.isas.peptideshaker.export.sections.PsProteinSection.writeSection(PsProteinSection.java:209)
    at eu.isas.peptideshaker.export.PSExportFactory.writeExport(PSExportFactory.java:387)
    at eu.isas.peptideshaker.gui.export.FeaturesPreferencesDialog$13.run(FeaturesPreferencesDialog.java:553)

So the problem seems to be that the value of "entry.getKey()" in the new implementation sometimes is 0, or the value of "representativeSite" is 0 in the original implementation too.

Problem can be reproduced in PeptideShaker with this SearchGUI result: https://www.dropbox.com/s/6on2fmy5y691r3i/Galaxy304-%5BLabel-SearchGUI%5D.searchgui_archive.zip?dl=0

hbarsnes commented 5 years ago

@CarlosHorro Highly recommended to use the "Insert code" option when pasting in code. Makes the text much more readable. So I edited your text above. ;)

hbarsnes commented 5 years ago

@CarlosHorro I'd also recommend checking the Database Processing settings for this dataset as they seemed to be using "-REVERSED" as the decoy tag while the FASTA file uses "_REVERSED", hence the FDR validation does not work correctly. Probably does not affect the bug, but worth correcting.

(Note that the change from "_REVERSED" to "-REVERSED" was made a while back in the new backend, but as it only caused issues such as for this dataset, it was later reverted. So if creating new search parameters with old FASTA files (with the latest new backend) the issues should be gone.)

@mvaudel The protein export exception does not occur when loading only the OMSSA and X! Tandem results together. And loading the MS-GF+ results is enough to reproduce the issue. Should simplify the debugging? :)

mvaudel commented 5 years ago

This commit solves the problem of the export. But we ought to verify that the ambiguous sites assigned to each representative make sense. https://github.com/compomics/peptide-shaker/commit/d56f4fd2c60ca5a4b57922f3509e697c3a24927f

CarlosHorro commented 5 years ago

I can confirm that the reports export works again :-) But not very sure about how to confirm that the ambiguous sites make sense... so I share the generated Protein report in case it may help you to check it...

https://www.dropbox.com/s/nppi5zn8d7vdynr/Protein_Report_with_non-validated_matches.xlsx?dl=0

hbarsnes commented 4 years ago

I will set this one to closed, as the reported bug has been fixed. We can always open a new issue if it turns out that the ambiguous sites do not make sense. :)