blachlylab / dhtslib

D bindings and OOP wrappers for htslib
MIT License
7 stars 1 forks source link

GFF3 coordinate offsets don't make sense #112

Open jblachly opened 2 years ago

jblachly commented 2 years ago

When looking at this unit test: https://github.com/blachlylab/dhtslib/blob/ca928a73796e50eb2758fca4c8d330cd28835020/source/dhtslib/gff/package.d#L21-L47

I was confused that an offset of 2 would translate from 11869 -> 11870

Looking at the code its clear that this is a mistake:

    /// Genomic coordinate at offset into feature, taking strandedness into account
    @property coordinateAtOffset(long offset) const
    {
        // GFF3 features are 1-based coordinates
        assert(offset > 0);
        offset--;   // necessary in a 1-based closed coordinate system

        // for - strand count from end; for +, ., and ? strand count from start
        immutable auto begin = (this.strand == '-' ? this.end : this.start);

        // for - strand count backward; for +, ., and ? strand count forward
        immutable auto direction = (this.strand == '-' ? -1 : 1);

        return OB(begin + (direction * offset));
    }

Agree that GFF3 is 1-based, and agree that GFF3 features are 1-based coordinates, but since you are dealing with OFFSETS (deltas), zero is still a reasonable input value.

Function should be revised to take >= 0 values, and should not offset--. Tests will then have to be updated.

charlesgregory commented 2 years ago

Yeah all this code was from a while ago. When it was part of gff3d we used the offsets as 1-based (even though that doesn't really make sense). There are actually a couple more functions that are implemented incorrectly.

This should be OB(this.length - 1) since the end is inclusive in a one-based system.

@property relativeEnd() const { return OB(this.length); }

This should be this.coordinateAtOffset(0); since an offset should be zero-based.

@property coordinateAtBegin() const
{
    return this.coordinateAtOffset(1);
}

This should be this.length - 1 since the end is inclusive in a one-based system.

/// Genomic coordinate at end of feature, taking strandedness into account
@property coordinateAtEnd() const
{
    return this.coordinateAtOffset(this.length);
}
charlesgregory commented 2 years ago

This should be this.length - 1 since the end is inclusive in a one-based system.

This is incorrect should be :

/// Genomic coordinate at end of feature, taking strandedness into account
@property coordinateAtEnd() const
{
    return this.coordinateAtOffset(this.length - 1);
}
charlesgregory commented 2 years ago

related to #114 and #115