exomiser / svart

API for use in representing and manipulating genomic variation
7 stars 1 forks source link

Add `Transposable` interface for user-defined transposability or alter `Stranded` implementers #68

Closed julesjacobsen closed 1 year ago

julesjacobsen commented 1 year ago

Currently the Stranded interface looks like this:

public interface Stranded<T> {

    Strand strand();

    T withStrand(Strand other);
// plus a few default methods 

It is inherited by GenomicRegion

public interface GenomicRegion extends Region<GenomicRegion>, Stranded<GenomicRegion> {
}

and then transitively by GenomicVariant, GenomicBreakendVariant and GenomicBreakend. The issue here is that implementers of GenomicRegion are forced to implement the T withStrand() method, whether it makes sense or not. Variants can occur on either strand, whereas transcripts and their sub-features are fixed to one strand or the other.

Investigate the effects of removing extends Stranded<GenomicRegion> and implementing this at lower levels of Transposable types.

ielis commented 1 year ago

Hi @julesjacobsen , one way of thinking about this is to model transcripts, exons, UTRs, etc. as entities that have a GenomicRegion instead of being a GenomicRegion. This approach has benefits. The intersections, overlaps, and distances between features that either have a region or are a region can be calculated. At the same times the features that have a region stay immutable because flipping to the opposite strand returns a new object.

No changes to the existing Svart classes would be necessary. However, I can imagine having an additional interface for features that have a GenomicRegion.

That's the stance I took in my little gene & transcript library here. See that both Gene and Transcript are Located. (Located represents the concept of a feature that has a GenomicRegion that I mentioned above).

julesjacobsen commented 1 year ago

Thanks for the comment @ielis, this sounds good.

If I might steal the Located interface and name it GenomicLocation like so:


interface GenomicLocation {
    GenomicRegion location();
    // plus the default methods using location()
}

Then split Stranded out into another interface Transposable

// Stranded loses its generic typing and methods so something can have a strand, but not be transposable
interface Stranded {
    Strand strand();
}

// generic types move to new Transposable interface 
interface Transposable<T> extends Stranded {

    T withStrand(Strand other);

    default T toOppositeStrand() {
        return withStrand(strand().opposite());
    }

    default T toPositiveStrand() {
        return withStrand(Strand.POSITIVE);
    }

    default T toNegativeStrand() {
        return withStrand(Strand.NEGATIVE);
    }
}

Minor change to GenomicRegion

// was:  public interface GenomicRegion extends Region<GenomicRegion>, Stranded<GenomicRegion> {
public interface GenomicRegion extends Region<GenomicRegion>, Transposable<GenomicRegion> {

}

Now users can pick composition or inheritance as they please.

// new user-defined compositional class
record TransposableFeature(GenomicRegion genomicRegion, String id, String sequence) implements GenomicLocation, Transposable<TransposableFeature> {

    @Override
    public GenomicRegion location() {
        return genomicRegion;
    }

    @Override
    public Strand strand() {
        return genomicRegion.strand();
    }

    @Override
    public TransposableFeature withStrand(Strand other) {
        if (strand() == other) {
            return this;
        }
        return new TransposableFeature(genomicRegion.withStrand(other), id, Seq.reverseComplement(sequence));
    }
}
GenomicRegion genomicRegion = GenomicRegion.of(TestContig.of(1, 1000), Strand.POSITIVE, Coordinates.oneBased(10, 200));
TransposableFeature feature = new TransposableFeature(genomicRegion, "1", "ATGC");
TransposableFeature featureTransposed = feature.toOppositeStrand();
System.out.println(feature);
System.out.println(featureTransposed);

// prints
// TransposableFeature[genomicRegion=GenomicRegion{contig=1, strand=+, coordinateSystem=ONE_BASED, start=10, end=200}, id=1, sequence=ATGC]
// TransposableFeature[genomicRegion=GenomicRegion{contig=1, strand=-, coordinateSystem=ONE_BASED, start=801, end=991}, id=1, sequence=GCAT] 

One thing which would make life nicer for records here is naming GenomicLocation::location to GenomicLocation::genomicRegion because then the record would automatically implement the interface:

interface GenomicLocation {
    GenomicRegion genomicRegion();
    // plus the default methods using genomicRegion()
}

record TransposableFeature(GenomicRegion genomicRegion, String id, String sequence) implements GenomicLocation, Transposable<TransposableFeature> {

    // override for genomicRegion() method not needed when using standard Java naming conventions i.e. Type type

    @Override
    public Strand strand() {
        return this.genomicRegion.strand();
    }

    @Override
    public TransposableFeature withStrand(Strand other) {
       // implementation...
    }
}
ielis commented 1 year ago

I think these are great ideas and they should work OK.

The only thing that is not 100% clear to me is if, with all these changes, we actually need to keep Stranded. I am not sure it provides much on its own. It seems to me that Strand exists only in the context of a GenomicRegion. I do not think that we will ever write a method like:

public <T extends Stranded> static void doSomething(T stranded) {
  // ...
}

that will work for Stranded items. I think we'll be writing methods for Region (when start, end & coordinate system is all we need), or for GenomicRegion (when we need contig & strand on top of the Region attributes).

In result, we should consider deprecating/removing Stranded and "inlining" its methods into GenomicRegion.

julesjacobsen commented 1 year ago

Stranded is needed for Transposable to work and actually GenomicLocation too. This allows this to happen:

interface Stranded {
    Strand strand();
}

interface Transposable<T> extends Stranded {

    T withStrand(Strand other);

    default T toOppositeStrand() {
        // Stranded.strand() used here
        return withStrand(strand().opposite());
    }

    default T toPositiveStrand() {
        return withStrand(Strand.POSITIVE);
    }

    default T toNegativeStrand() {
        return withStrand(Strand.NEGATIVE);
    }
}

interface GenomicLocation extends Stranded  {

    GenomicRegion genomicRegion();

    // Stranded.strand() overridden here
    default Strand strand() {
        return genomicRegion().strand();
    }
    // plus the default methods using location()
}

record TransposableFeature(GenomicRegion genomicRegion, String id, String sequence) implements GenomicLocation, Transposable<TransposableFeature> {

    @Override
    public TransposableFeature withStrand(Strand other) {
        // Because GenomicLocation extends Stranded, the default GenomicLocation.strand() method is used here 
        //  and needn't be manually implemented in every implementation of GenomicLocation
        if (this.strand() == other) {
            return this;
        }
        return new TransposableFeature(genomicRegion.withStrand(other), id, Seq.reverseComplement(sequence));
    }
}

record NonTransposableFeature(GenomicRegion genomicRegion, String id, String sequence) implements GenomicLocation {
    // nothing special needed here
}
ielis commented 1 year ago

Yeah, you're right. These are indeed good points and it looks like we need Stranded.