Closed karlrl closed 6 years ago
The flake8 problems are addressed as part of #166. I'm happy to cherry-pick that here if the other PR is rejected.
Sorry for the delayed response on this. While this could improve execution time for one aspect of the pipeline, the trimming component is not a dominant piece of the runtime. I'm concerned about adopting a complex dependency without stronger evidence of its need (and note trimming can be performed prior to deblur execution).
Using Pysam (which wraps htslib) speeds up the sequence trimming step by a good amount (~50x). This changeset switches from using
skbio.read()
to usingpysam.FastxFile()
and makes a few other changes to accommodate theFastxFile()
requirement that the file be passed as a path (or stdin), not a Python file-like object.As an example of the speedup, processing a ~50k FASTA file took 65s with
skbio.read()
and 1.4.s withpysam.FastxFile()
.