BioJulia / FASTX.jl

Parse and process FASTA and FASTQ formatted files of biological sequences.
https://biojulia.dev
MIT License
61 stars 20 forks source link

Add constant time fasta index seeking [NEED REVIEW] #31

Closed jakobnissen closed 3 years ago

jakobnissen commented 4 years ago

Addresses issue #29. This PR also includes #30 .

Changes:

codecov[bot] commented 4 years ago

Codecov Report

Merging #31 into master will increase coverage by 0.71%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #31      +/-   ##
==========================================
+ Coverage   86.88%   87.60%   +0.71%     
==========================================
  Files          13       13              
  Lines         572      605      +33     
==========================================
+ Hits          497      530      +33     
  Misses         75       75              
Flag Coverage Δ
#unittests 87.60% <100.00%> (+0.71%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/fasta/index.jl 100.00% <100.00%> (ø)
src/fasta/reader.jl 90.47% <100.00%> (+4.76%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update d5b23c1...345d275. Read the comment docs.

TransGirlCodes commented 4 years ago

Is this to be merged? I already merged the other related PR.

CiaranOMara commented 4 years ago

The new indexing logic proposed here looks good to me. Programatically this PR is good to merge.

Though maintenance-wise there was, however, one thing that wasn't crucially obvious to me. I got caught out not realising that the spec allows differing line termination lengths/characters for different sequences in the same file. My temptation was to optimise out what I had perceived to be duplicate calculations of the number of bytes used for the line termination character. This optimisation, of course, would have been wrong. I'd like to see a comment that warns against such a temptation and a test that ensures indexing works with differing line termination characters (I didn't see such a test when I skimmed runtests.jl).

jakobnissen commented 4 years ago

Added tests and comments as requested now. If CI gives me its blessings, this can be merged.