blachlylab / dhtslib

D bindings and OOP wrappers for htslib
MIT License
7 stars 1 forks source link

CoordinateSystem Implementation and Adoption #71

Closed charlesgregory closed 3 years ago

charlesgregory commented 3 years ago

Opening this as rolling pull request for suggestions and discussion as this is a significant change with lots of moving pieces and involves issues like #55, #68.

Below are the current changes:

Reworked Coordinates This is necessary as many types do not naturally report a full coordinate set but a singular coordinate. Now includes a singular Coordinate typed by the Based enum. Coordinates are now comprised of singular Coordinate types. Also cleaned up the to function considerably.

(Also dlang metaprogramming is black magic)

codecov-commenter commented 3 years ago

Codecov Report

Merging #71 (9d59a35) into develop (5babdfa) will increase coverage by 4.27%. The diff coverage is 97.14%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop      #71      +/-   ##
===========================================
+ Coverage    72.16%   76.44%   +4.27%     
===========================================
  Files           22       22              
  Lines         1872     1970      +98     
===========================================
+ Hits          1351     1506     +155     
+ Misses         521      464      -57     
Impacted Files Coverage Δ
source/dhtslib/sam/record.d 81.25% <0.00%> (-0.27%) :arrow_down:
source/dhtslib/tabix.d 0.00% <ø> (ø)
source/dhtslib/vcf.d 74.85% <83.33%> (-0.23%) :arrow_down:
source/dhtslib/coordinates.d 98.92% <100.00%> (+18.92%) :arrow_up:
source/dhtslib/faidx.d 88.88% <100.00%> (+88.88%) :arrow_up:
source/dhtslib/sam/cigar.d 89.43% <100.00%> (ø)
source/dhtslib/sam/reader.d 86.23% <100.00%> (+1.61%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 5babdfa...9d59a35. Read the comment docs.

jblachly commented 3 years ago

I don't love the idea of "rolling pull request" because I don't ever know when to review it ?

charlesgregory commented 3 years ago

I see it as an opportunity to get comments on current changes without committing directly to master. But a lot of organizations don't call it a rolling pull request, its just a pull request.

charlesgregory commented 3 years ago

Also my idea of you reviewing code is right before we commit it. With something like this I would like your opinion during development rather than at the end. But if you would like I can just request your review each time I would like input?

jblachly commented 3 years ago

If you do it this way just @ me when required

charlesgregory commented 3 years ago

@jblachly How should we handle range-based lookup like faidx, tabix, and bam region query? We can easily create methods that utilize a Coordinates struct as input but what about string input methods? Should we assume 0-based closed for a query string like faidx?

charlesgregory commented 3 years ago

@jblachly should we require using the Coordinates struct on opIndex methods that previously used long? This could be tricky with the opDollar in SAMReader. For now I just have a work around, but I would rather enforce usage of Coordinates.

charlesgregory commented 3 years ago

@jblachly How should we handle range-based lookup like faidx, tabix, and bam region query? We can easily create methods that utilize a Coordinates struct as input but what about string input methods? Should we assume 0-based closed for a query string like faidx?

I think I will toy with the idea of removing string-based query methods and replacing them with the ability to create Coordinates from strings. This allows better "off-by-one" safety.

charlesgregory commented 3 years ago

@jblachly This is probably ready for review. I think I converted everything, sam, tabix, vcf.

jblachly commented 3 years ago

I think this is just waiting on unit tests. I can approve after that

charlesgregory commented 3 years ago

I think this is just waiting on unit tests. I can approve after that

@jblachly Which unittests?

jblachly commented 3 years ago

Maybe something relatedt o bam files but now I can't find the conversation / maybe all were marked as resolved?

I can approve this if you ensure we get good testingg

charlesgregory commented 3 years ago

At least for some bam files and faidx'd fasta files, we query using all coordinate system types and get identical results. We could use more testing but for now this is good enough and shows that it seems to be working as intended. As long as I wrote the unittests correctly everything should be verified.

https://github.com/blachlylab/dhtslib/blob/9d59a351958b9ab054c2eb2955cd1cc5ba7411c9/source/dhtslib/faidx.d#L204-L244

https://github.com/blachlylab/dhtslib/blob/9d59a351958b9ab054c2eb2955cd1cc5ba7411c9/source/dhtslib/sam/reader.d#L803-L851

charlesgregory commented 3 years ago

Just to have this automatically close. This pull request closes #55.