Closed jakobnissen closed 3 years ago
Merging #31 into master will increase coverage by
0.71%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## master #31 +/- ##
==========================================
+ Coverage 86.88% 87.60% +0.71%
==========================================
Files 13 13
Lines 572 605 +33
==========================================
+ Hits 497 530 +33
Misses 75 75
Flag | Coverage Δ | |
---|---|---|
#unittests | 87.60% <100.00%> (+0.71%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
src/fasta/index.jl | 100.00% <100.00%> (ø) |
|
src/fasta/reader.jl | 90.47% <100.00%> (+4.76%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update d5b23c1...345d275. Read the comment docs.
Is this to be merged? I already merged the other related PR.
The new indexing logic proposed here looks good to me. Programatically this PR is good to merge.
Though maintenance-wise there was, however, one thing that wasn't crucially obvious to me. I got caught out not realising that the spec allows differing line termination lengths/characters for different sequences in the same file. My temptation was to optimise out what I had perceived to be duplicate calculations of the number of bytes used for the line termination character. This optimisation, of course, would have been wrong. I'd like to see a comment that warns against such a temptation and a test that ensures indexing works with differing line termination characters (I didn't see such a test when I skimmed runtests.jl
).
Added tests and comments as requested now. If CI gives me its blessings, this can be merged.
Addresses issue #29. This PR also includes #30 .
Changes:
FASTA.Index
now uses aDict
instead of a vector of names. This is slower to construct and more memory inefficient, but obviously has the benefit of constant-time lookup in the index. I'd argue that if you create an index, you do so specifically in order to have fast lookup, and you probably don't care about the few extra kilobytes of RAM used, or miliseconds when constructing the index.extract(r::Reader, A::Alphabet, s::AbstractString, u::UnitRange)
, which does the same assequence(A, r[s], u)
, but does not load the entire sequence into memory first.