Open xapple opened 1 year ago
Hi @xapple,
I did not include GBParsy because i was not aware of this project, and since it's not on PyPI it's not exactly the most convenient, tools-included GenBank parser out there. Additionally, I tried to build from source from the GitHub repository you linked, but the code seems quite outdated (it still uses the PyString_FromStringAndSize
C API, which was removed from Python 3)...
Yes, you are right, the code was written in 2008 which is sixteen years ago, and is probably not compatible with the current Python C API. Also, it has not been uploaded to PyPI or conda-forge.
Digging a bit deeper I did realize that the code on the GitHub repository is an export of the old google-code repository and doesn't represent the latest version. The repository has v0.5.0 while the supplementary file of the publication iteself includes v0.6.0 (2008-07-10) at:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2516526/bin/1471-2105-9-321-S1.tgz
I was looking for a fast way of processing large amounts of genbank entries, and found your library. It definitely offers an improvement over
biopython
, but I'm wondering why did you not include GBParsy in the speed comparison? It is a parser written in pure C, and likely even faster thangb-io
.