Remove Biopython dependency

olgabot commented 4 years ago

This PR removes the biopython dependency because a lot of time is spent converting between Python strings and Biopython Seq objects and back, which makes sencha translate take forever

PR checklist

[ ] This comment contains a description of changes (with reason)
[ ] If you've fixed a bug or added code that should be tested, add tests!
[ ] Ensure the test suite passes with pytest . (command to run: pytest or make coverage if you want to see which lines don't have tests yet)
[ ] Make sure your code is linted and autoformatted using black (black . --check).
[ ] Documentation in usage.md is updated
[ ] README.md is updated

olgabot commented 4 years ago

timeit.timeit(
   ...:         'TranslateSingleSeq.three_frame_translation(Seq("CGCTTGCTTAATACTGACATCAATAATATTAGGAAAATCGCAATATAACTGTAAATCCTGTTCTGTC"))',
   ...:     setup='from Bio.Seq import Seq\nfrom sencha.translate_single_seq import TranslateSingleSeq',
   ...:     number=int(1e6))
Out[13]: 0.6314636569999834

New way:

from sencha.constants_translate import STANDARD_CODON_TABLE_MAPPING
timeit.timeit(
   ...:         'TranslateSingleSeq.three_frame_translation(Seq("CGCTTGCTTAATACTGACATCAATAATATTAGGAAAATCGCAATATAACTGTAAATCCTGTTCTGTC"))',
   ...:     setup='from Bio.Seq import Seq\nfrom sencha.translate_single_seq import TranslateSingleSeq',
   ...:     number=int(1e6))
Out[18]: 0.5854893400000094

Hmm, only 8% faster??

0.5854893400000094/0.6314636569999834
Out[19]: 0.9271940411925002

olgabot commented 4 years ago

Now the reads just stay as pure Python strings! No Biopython backend necessary. The translation happens in translate_single_seq.py using the STANDARD_CODON_TABLE specified in constants_translate.py.

czbiohub-sf / orpheum

Remove Biopython dependency #82

PR checklist