exomiser / svart

API for use in representing and manipulating genomic variation
7 stars 1 forks source link

Breakend creation #43

Closed ielis closed 3 years ago

ielis commented 3 years ago

@julesjacobsen I think we might need to look at the ways how we can create Breakend instance. In particular, how someone can use the code in a wrong way.

At the moment, we can create Breakend using any coordinate system:

public class BreakendTest {

    private final Contig contig = TestContig.of(1, 5);

    @ParameterizedTest
    @CsvSource({
            "FULLY_CLOSED, 2,   2, 2",
            "   LEFT_OPEN, 2,   2, 2",
            "  RIGHT_OPEN, 2,   2, 2",
//            "  FULLY_OPEN, 2,   2, 2",  // fails
    })
    public void differentWaysOfBreakendCreation(CoordinateSystem coordinateSystem, int pos, int start, int end) {
        Breakend breakend = Breakend.of(contig, "id", Strand.POSITIVE, coordinateSystem, Position.of(pos));

        assertThat(breakend.start(), is(start));
        assertThat(breakend.end(), is(end));
    }
}

The 4th case fails and in result it is not legal to create Breakend using a FULLY_OPEN coordinate system.

I think we might need to revise the static constructor and allow to specify start and end positions.

julesjacobsen commented 3 years ago

Damn breakends! Damn fully-open coordinate systems! Damn them all!

I see some options. Currently the breakend is implicitly fully-open :

  1. Keep the single Position input and 'expand' it to a single empty base for FULLY_OPEN case:

    private DefaultBreakend(Contig contig, String id, Strand strand, CoordinateSystem coordinateSystem, Position position) {
    super(contig, strand, coordinateSystem, position, coordinateSystem == CoordinateSystem.FULLY_OPEN ? position.shift(1) : position);
    this.id = Objects.requireNonNull(id);
    }

    This is a bit odd as in some systems the length is 1 (fully open/closed), whereas in a half-open system the length is 0.

  2. Add an end to the constructor signature and then check that the length is zero/one

    private DefaultBreakend(Contig contig, String id, Strand strand, CoordinateSystem coordinateSystem, Position position) {
    super(contig, strand, coordinateSystem, position, coordinateSystem == CoordinateSystem.FULLY_OPEN ? position.shift(1) : position);
    if (super.length() >= 0) {
        throw new IllegalStateException("Breakend must have length zero!");
    }
    this.id = Objects.requireNonNull(id);
    }

We had decided at one point to define a breakend as being exclusively a single base, but this was due to the limitation of not being able to handle zero-length intervals at the time and also that we were focussing on VCF representation. Although its a bit odd, I think keeping the constructor as requiring a single position as it is meant to represent a broken end. This position can be expanded into a region where fully-open/closed systems represent a whole base and the half-open/closed systems a slice 5' or 3' of the base, or that they are always a single base i.e. the position is the position (ordinal) on the contig.

julesjacobsen commented 3 years ago

But then these rules are somewhat arbitrary, perhaps they should only ever be a single base long as this is representable in all systems and will translate clearly between coordinate systems - changing the length depending on the type of CoordinateSystem has a pretty high wtf score.

I propose we keep the single position, and represent the breakend as either length 1 or 0 for all coordinate systems. Length 1 fits with the VCF view, but it might be easier to use length 0, in which case we need to double-check our current breakend tests. Coordinate systems which do not naturally represent a full/empty base will adjust the input to correct this. Will need to consider how Breakend.unresolved() represents things. Currently:

private UnresolvedBreakend() {
    super(Contig.unknown(), Strand.POSITIVE, CoordinateSystem.FULLY_CLOSED, Position.of(1), Position.of(0));
}

So maybe this answers my ramblings - they should be length zero.

julesjacobsen commented 3 years ago
public class BreakendTest {

    @ParameterizedTest
    @CsvSource({
            "FULLY_CLOSED, 1,   1, 0",
            "FULLY_CLOSED, 5,   5, 4",

            "LEFT_OPEN,    0,   0, 0",
            "LEFT_OPEN,    5,   5, 5",

            "RIGHT_OPEN,   1,   1, 1",
            "RIGHT_OPEN,   6,   6, 6",

            "FULLY_OPEN,   1,   1, 2",
            "FULLY_OPEN,   5,   5, 6",
    })
    public void createWithCoordinatesSystems(CoordinateSystem coordinateSystem, int pos, int start, int end) {
        Contig ctg5 = TestContig.of(5, 5);
        Breakend breakend = Breakend.of(ctg5, "id", Strand.POSITIVE, coordinateSystem, Position.of(pos));
        assertThat(breakend.start(), equalTo(start));
        assertThat(breakend.end(), equalTo(end));
        assertThat(breakend.length(), equalTo(0));
    }
}
ielis commented 3 years ago

This has been adressed