Closed julesjacobsen closed 1 year ago
Hi @julesjacobsen ,
one way of thinking about this is to model transcripts, exons, UTRs, etc. as entities that have a GenomicRegion
instead of being a GenomicRegion
. This approach has benefits. The intersections, overlaps, and distances between features that either have a region or are a region can be calculated. At the same times the features that have a region stay immutable because flipping to the opposite strand returns a new object.
No changes to the existing Svart classes would be necessary. However, I can imagine having an additional interface for features that have a GenomicRegion
.
That's the stance I took in my little gene & transcript library here.
See that both Gene and Transcript are Located. (Located
represents the concept of a feature that has a GenomicRegion
that I mentioned above).
Thanks for the comment @ielis, this sounds good.
If I might steal the Located
interface and name it GenomicLocation
like so:
interface GenomicLocation {
GenomicRegion location();
// plus the default methods using location()
}
Then split Stranded
out into another interface Transposable
// Stranded loses its generic typing and methods so something can have a strand, but not be transposable
interface Stranded {
Strand strand();
}
// generic types move to new Transposable interface
interface Transposable<T> extends Stranded {
T withStrand(Strand other);
default T toOppositeStrand() {
return withStrand(strand().opposite());
}
default T toPositiveStrand() {
return withStrand(Strand.POSITIVE);
}
default T toNegativeStrand() {
return withStrand(Strand.NEGATIVE);
}
}
Minor change to GenomicRegion
// was: public interface GenomicRegion extends Region<GenomicRegion>, Stranded<GenomicRegion> {
public interface GenomicRegion extends Region<GenomicRegion>, Transposable<GenomicRegion> {
}
Now users can pick composition or inheritance as they please.
// new user-defined compositional class
record TransposableFeature(GenomicRegion genomicRegion, String id, String sequence) implements GenomicLocation, Transposable<TransposableFeature> {
@Override
public GenomicRegion location() {
return genomicRegion;
}
@Override
public Strand strand() {
return genomicRegion.strand();
}
@Override
public TransposableFeature withStrand(Strand other) {
if (strand() == other) {
return this;
}
return new TransposableFeature(genomicRegion.withStrand(other), id, Seq.reverseComplement(sequence));
}
}
GenomicRegion genomicRegion = GenomicRegion.of(TestContig.of(1, 1000), Strand.POSITIVE, Coordinates.oneBased(10, 200));
TransposableFeature feature = new TransposableFeature(genomicRegion, "1", "ATGC");
TransposableFeature featureTransposed = feature.toOppositeStrand();
System.out.println(feature);
System.out.println(featureTransposed);
// prints
// TransposableFeature[genomicRegion=GenomicRegion{contig=1, strand=+, coordinateSystem=ONE_BASED, start=10, end=200}, id=1, sequence=ATGC]
// TransposableFeature[genomicRegion=GenomicRegion{contig=1, strand=-, coordinateSystem=ONE_BASED, start=801, end=991}, id=1, sequence=GCAT]
One thing which would make life nicer for records here is naming GenomicLocation::location
to GenomicLocation::genomicRegion
because then the record would automatically implement the interface:
interface GenomicLocation {
GenomicRegion genomicRegion();
// plus the default methods using genomicRegion()
}
record TransposableFeature(GenomicRegion genomicRegion, String id, String sequence) implements GenomicLocation, Transposable<TransposableFeature> {
// override for genomicRegion() method not needed when using standard Java naming conventions i.e. Type type
@Override
public Strand strand() {
return this.genomicRegion.strand();
}
@Override
public TransposableFeature withStrand(Strand other) {
// implementation...
}
}
I think these are great ideas and they should work OK.
The only thing that is not 100% clear to me is if, with all these changes, we actually need to keep Stranded
. I am not sure it provides much on its own. It seems to me that Strand
exists only in the context of a GenomicRegion
. I do not think that we will ever write a method like:
public <T extends Stranded> static void doSomething(T stranded) {
// ...
}
that will work for Stranded
items. I think we'll be writing methods for Region
(when start, end & coordinate system is all we need), or for GenomicRegion
(when we need contig & strand on top of the Region
attributes).
In result, we should consider deprecating/removing Stranded
and "inlining" its methods into GenomicRegion
.
Stranded
is needed for Transposable
to work and actually GenomicLocation
too. This allows this to happen:
interface Stranded {
Strand strand();
}
interface Transposable<T> extends Stranded {
T withStrand(Strand other);
default T toOppositeStrand() {
// Stranded.strand() used here
return withStrand(strand().opposite());
}
default T toPositiveStrand() {
return withStrand(Strand.POSITIVE);
}
default T toNegativeStrand() {
return withStrand(Strand.NEGATIVE);
}
}
interface GenomicLocation extends Stranded {
GenomicRegion genomicRegion();
// Stranded.strand() overridden here
default Strand strand() {
return genomicRegion().strand();
}
// plus the default methods using location()
}
record TransposableFeature(GenomicRegion genomicRegion, String id, String sequence) implements GenomicLocation, Transposable<TransposableFeature> {
@Override
public TransposableFeature withStrand(Strand other) {
// Because GenomicLocation extends Stranded, the default GenomicLocation.strand() method is used here
// and needn't be manually implemented in every implementation of GenomicLocation
if (this.strand() == other) {
return this;
}
return new TransposableFeature(genomicRegion.withStrand(other), id, Seq.reverseComplement(sequence));
}
}
record NonTransposableFeature(GenomicRegion genomicRegion, String id, String sequence) implements GenomicLocation {
// nothing special needed here
}
Yeah, you're right. These are indeed good points and it looks like we need Stranded
.
Currently the
Stranded
interface looks like this:It is inherited by
GenomicRegion
and then transitively by
GenomicVariant
,GenomicBreakendVariant
andGenomicBreakend
. The issue here is that implementers ofGenomicRegion
are forced to implement theT withStrand()
method, whether it makes sense or not. Variants can occur on either strand, whereas transcripts and their sub-features are fixed to one strand or the other.Investigate the effects of removing
extends Stranded<GenomicRegion>
and implementing this at lower levels of Transposable types.