blachlylab / dhtslib

D bindings and OOP wrappers for htslib
MIT License
7 stars 1 forks source link

Typesafe coordinate systems #55

Closed jblachly closed 3 years ago

jblachly commented 3 years ago

With commit https://github.com/blachlylab/dhtslib/commit/6e93a29dc5e8d7ebcc4637301ca4749e00c84b3a I started WIP on an experimental typesafe coordinate system, building on the ideas originally in dhtslib.faidx (implemented because of the unusual zero-based closed coordinates used by HTSlib faidx functions)

The ideas I would like to explore include

* I am likely not using the term Typeclass strictly correctly in comp sci terms

** coordinate systems are defined as product of zero- or one-based; half-open or closed

In my initial work, there is a disappointing amount of static if and repetition overall (especially when the conversions are invertible), but I don't know enough dlang type system/metaprogramming tricks to shorten it.

Would like some help to write unit tests, even if it seems silly and repetitious.

charlesgregory commented 3 years ago

I would also like to see compiler warnings when using direct integer coordinates within the library to indicate a developer may encounter off-by-one errors.

charlesgregory commented 3 years ago

In my initial work, there is a disappointing amount of static if and repetition overall (especially when the conversions are invertible), but I don't know enough dlang type system/metaprogramming tricks to shorten it.

Since the CoordSystem is a number enum. Could use (a << 2) & b trick with a switch:

CoordSystem a,b;
int combined = (a << 2) & b;
static switch(combined)
{
case 0:
*do whatever*
case 1:
case 2:
case 3:
* do same thing for all these*
...
}

Though this approach would need to be well documented and unittested as it would be easier to make a mistake. You could also simplify it by using array indexing and cutting out the switch statement. Indexing into a predefined start array with your combined variable would yield +1,-1, or 0. You would do the same for an end array and then you simply create your new Coordinates object. Though this would be purely cosmetic as this is all done at compile time and yields no performance benefits (unless for some reason your CoordSystems aren't known at compile time).

jblachly commented 3 years ago

@charlesgregory There is static foreach but I am not aware of compile time static switch ?

I also thought given the symmetry that a lookup table would be appropriate, but again as you point out it would really be for "beauty" reasons, because the ugly repetitious code as is, is functional.

jblachly commented 3 years ago

I would also like to see compiler warnings when using direct integer coordinates within the library to indicate a developer may encounter off-by-one errors.

Not sure yet how I feel about this. Could be good, could be super annoying

charlesgregory commented 3 years ago

There is static foreach but I am not aware of compile time static switch ?

Ah, you are correct. I assumed it would exist, but apparently not.

charlesgregory commented 3 years ago

Closed with merging of #71.