althonos / pyrodigal

Cython bindings and Python interface to Prodigal, an ORF finder for genomes and metagenomes. Now with SIMD!
https://pyrodigal.readthedocs.org
GNU General Public License v3.0
138 stars 5 forks source link

Reimplement `Sequence` storage without bitmaps #5

Closed althonos closed 3 years ago

althonos commented 3 years ago

The original Prodigal implementation uses bitmaps to store the input sequences, which allows storing both strands in the usual indexing order, and reduces overall memory consumption by 35% (using a 2-bitmap for the forward strand, a 2-bitmap for the reverse strand, and a 1-bitmap to mark unknown characters). However this approach also comes with sensible drawbacks: testing a bitmap takes more time and prevents SIMD implementation.

This PR reimplements most of the functions in the Prodigal C code as Cython function that work with a sequence stored in a contiguous uint8_t array instead of bitmaps, among other things.

codecov[bot] commented 3 years ago

Codecov Report

Merging #5 (c87d478) into master (f2d0654) will increase coverage by 0.14%. The diff coverage is 93.39%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master       #5      +/-   ##
==========================================
+ Coverage   93.07%   93.21%   +0.14%     
==========================================
  Files           8        8              
  Lines         982     2108    +1126     
==========================================
+ Hits          914     1965    +1051     
- Misses         68      143      +75     
Flag Coverage Δ
CPython 93.21% <93.39%> (+0.14%) :arrow_up:
Linux 93.21% <93.39%> (+0.14%) :arrow_up:
OSX 93.21% <93.39%> (+0.14%) :arrow_up:
PyPy 98.07% <ø> (ø)
Windows 99.73% <ø> (ø)
v3.5 93.14% <93.39%> (+0.22%) :arrow_up:
v3.6 93.19% <93.39%> (+0.16%) :arrow_up:
v3.7 93.19% <93.39%> (+0.16%) :arrow_up:
v3.8 93.21% <93.39%> (+0.14%) :arrow_up:
v3.9 93.21% <93.39%> (+0.14%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pyrodigal/_pyrodigal.pyx 91.82% <93.39%> (+2.79%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update f2d0654...c87d478. Read the comment docs.