Closed ielis closed 3 years ago
Damn breakends! Damn fully-open coordinate systems! Damn them all!
I see some options. Currently the breakend is implicitly fully-open :
Keep the single Position
input and 'expand' it to a single empty base for FULLY_OPEN
case:
private DefaultBreakend(Contig contig, String id, Strand strand, CoordinateSystem coordinateSystem, Position position) {
super(contig, strand, coordinateSystem, position, coordinateSystem == CoordinateSystem.FULLY_OPEN ? position.shift(1) : position);
this.id = Objects.requireNonNull(id);
}
This is a bit odd as in some systems the length is 1 (fully open/closed), whereas in a half-open system the length is 0.
Add an end
to the constructor signature and then check that the length is zero/one
private DefaultBreakend(Contig contig, String id, Strand strand, CoordinateSystem coordinateSystem, Position position) {
super(contig, strand, coordinateSystem, position, coordinateSystem == CoordinateSystem.FULLY_OPEN ? position.shift(1) : position);
if (super.length() >= 0) {
throw new IllegalStateException("Breakend must have length zero!");
}
this.id = Objects.requireNonNull(id);
}
We had decided at one point to define a breakend as being exclusively a single base, but this was due to the limitation of not being able to handle zero-length intervals at the time and also that we were focussing on VCF representation. Although its a bit odd, I think keeping the constructor as requiring a single position as it is meant to represent a broken end. This position can be expanded into a region where fully-open/closed systems represent a whole base and the half-open/closed systems a slice 5' or 3' of the base, or that they are always a single base i.e. the position is the position (ordinal) on the contig.
But then these rules are somewhat arbitrary, perhaps they should only ever be a single base long as this is representable in all systems and will translate clearly between coordinate systems - changing the length depending on the type of CoordinateSystem has a pretty high wtf score.
I propose we keep the single position, and represent the breakend as either length 1 or 0 for all coordinate systems. Length 1 fits with the VCF view, but it might be easier to use length 0, in which case we need to double-check our current breakend tests. Coordinate systems which do not naturally represent a full/empty base will adjust the input to correct this. Will need to consider how Breakend.unresolved() represents things. Currently:
private UnresolvedBreakend() {
super(Contig.unknown(), Strand.POSITIVE, CoordinateSystem.FULLY_CLOSED, Position.of(1), Position.of(0));
}
So maybe this answers my ramblings - they should be length zero.
public class BreakendTest {
@ParameterizedTest
@CsvSource({
"FULLY_CLOSED, 1, 1, 0",
"FULLY_CLOSED, 5, 5, 4",
"LEFT_OPEN, 0, 0, 0",
"LEFT_OPEN, 5, 5, 5",
"RIGHT_OPEN, 1, 1, 1",
"RIGHT_OPEN, 6, 6, 6",
"FULLY_OPEN, 1, 1, 2",
"FULLY_OPEN, 5, 5, 6",
})
public void createWithCoordinatesSystems(CoordinateSystem coordinateSystem, int pos, int start, int end) {
Contig ctg5 = TestContig.of(5, 5);
Breakend breakend = Breakend.of(ctg5, "id", Strand.POSITIVE, coordinateSystem, Position.of(pos));
assertThat(breakend.start(), equalTo(start));
assertThat(breakend.end(), equalTo(end));
assertThat(breakend.length(), equalTo(0));
}
}
This has been adressed
@julesjacobsen I think we might need to look at the ways how we can create
Breakend
instance. In particular, how someone can use the code in a wrong way.At the moment, we can create
Breakend
using any coordinate system:The 4th case fails and in result it is not legal to create Breakend using a
FULLY_OPEN
coordinate system.I think we might need to revise the static constructor and allow to specify
start
andend
positions.