exomiser / svart

API for use in representing and manipulating genomic variation
7 stars 1 forks source link

Bug in `VariantTrimmer` #55

Closed ielis closed 2 years ago

ielis commented 2 years ago

Hi @julesjacobsen , I found a bug in VariantTrimmer.

Running the following:

 @ParameterizedTest
 @CsvSource({
         "C, CGC",
         "C, CC",
         "CGC, C",
         "CC, C"
 })
 public void convert(String ref, String alt) {
     VcfConverter instance = new VcfConverter(b37, VariantTrimmer.leftShiftingTrimmer(VariantTrimmer.removingCommonBase()));
     Variant variant = instance.convert(instance.parseContig("chr1"), "rs123456", 10, ref, alt);
 }

leads to:

java.lang.StringIndexOutOfBoundsException: String index out of range: 0

    at java.base/java.lang.StringLatin1.charAt(StringLatin1.java:48)
    at java.base/java.lang.String.charAt(String.java:711)
    at org.monarchitiative.svart@2.0.0-SNAPSHOT/org.monarchinitiative.svart.util.VariantTrimmer.canLeftTrim(VariantTrimmer.java:181)
    at org.monarchitiative.svart@2.0.0-SNAPSHOT/org.monarchinitiative.svart.util.VariantTrimmer.leftShift(VariantTrimmer.java:124)
    at org.monarchitiative.svart@2.0.0-SNAPSHOT/org.monarchinitiative.svart.util.VariantTrimmer$LeftShiftingTrimmer.trim(VariantTrimmer.java:35)
    at org.monarchitiative.svart@2.0.0-SNAPSHOT/org.monarchinitiative.svart.util.VcfConverter.checkAndTrimNonSymbolic(VcfConverter.java:70)
    at org.monarchitiative.svart@2.0.0-SNAPSHOT/org.monarchinitiative.svart.util.VcfConverter.convert(VcfConverter.java:59)
    at org.monarchitiative.svart@2.0.0-SNAPSHOT/org.monarchinitiative.svart.util.VcfConverterTest.convert(VcfConverterTest.java:34)

...

The bug is only present when trimming with LEFT shifting and REMOVING the common base. Other strategies work well.

I think the bug can be solved by changing || to && in VariantTrimmer line 181. I added a test (different from the test above) that shows the bug presence to branch variant_trimmer_bug here.

Could you please look at this when you have a chance? I think you understand trimming better than I.

Thanks & cheers, DD

julesjacobsen commented 2 years ago

Uh-oh. Good find! I presume you found this the hard way rather than checking the tests manually?

ielis commented 2 years ago

I found the bug empirically, by running SvAnna & debugger. I am somewhat sure the bug happens when the longer allele has the same base at both ends. I am not sure, however, what is the desired trimming output :confused:

julesjacobsen commented 2 years ago

Was doing some work here anyway so I applied you fix.

julesjacobsen commented 2 years ago

I agree trimming does seem overly complicated but its just the way it is! Looks like you got the right idea. @ielis can you build and test this please.

ielis commented 2 years ago

The code where && is replaced with || works OK on my VCFs, so I think we fixed the bug.