Closed andersgs closed 3 years ago
Hi Anders!
Thank you for notifying us of this issue. Also, thank you for taking an in-depth look into the driver of the problem. I will consult with the other major developer and we will resolve this issue in ~1-2 days.
Thank you again for pointing this out to us!
All the best,
Jacob L. Steenwyk
Hi Anders,
Thank you again for notifying us of this issue and taking the time to write us a well-thought-out comment. We identified the issue stems from using biopython, v1.79, instead of ClipKIT's pinned version, v1.76.
We will begin working on the next release of ClipKIT, which will use biopython, v1.79. Given the developer team has their hands full (work, vacation, etc.) and I want to make sure nothing else breaks, I anticipate the next release of ClipKIT will come out in about a week. Sorry if this causes any inconvenience for you.
Again, thank you for your insight, Anders. Please feel free to message us with any other comments, concerns, or suggestions you may have (no matter how big or small!). We hope that ClipKIT continues to fulfill your research needs.
All the best,
Jacob L. Steenwyk
Hi Anders,
Our team was able to update the codebase to work with biopython, v1.79. This new change is implemented in ClipKIT version 1.1.5. Biopython, v1.79, is not available on the Anaconda cloud. At this time, ClipKIT, v1.1.5, is available via GitHub and PyPi.
Thank you again for using ClipKIT! You may be interested in our sister toolkit, PhyKIT https://jlsteenwyk.com/PhyKIT/, a broadly applicable software for analyzing and processing multiple sequence alignments and phylogenies.
All the best,
Jacob
Nice work @JLSteenwyk. Thank you!
@JLSteenwyk in case you are curious, this is where I am using clipkit: https://github.com/MDU-PHL/kovid-trees-nf/
I usually use gotree for tree manipulation and goalign for alignment manipulation: https://github.com/evolbioinfo/{gotree,goalign}
Will check phykit out...
Thank you @andersgs! It is really humbling that you decided to use ClipKIT in kovid-trees-nf. Thank you for your support!
Hello.
I am having the following issue, and below I post how I fixed it. Thank you for the great tool.
The issue occurred on Python 3.8 and clipkit version 1.1.3, BioPython 1.79 and Numpy 1.20.3, and using the following command line:
Which in the past has worked well, but today my output from a FASTA alignment (requesting a FASTA output alignment) is returning numerical data:
I have tracked the issue down to this line of code:
https://github.com/JLSteenwyk/ClipKIT/blob/cccc8bfcb6fd70e3fe0fa033c317ea015fc07b49/clipkit/modes.py#L83
Essentially, that is returning an
int
because the code is indexing abytes
object (https://docs.python.org/3/library/stdtypes.html#bytes-objects). I am unsure if there has been some change to BioPython or Numpy here that might be causing the issue.Changing the line to:
Makes thing work, but at a significant cost to speed.
So, I tried a different approach that works at a good speed but requires the following changes:
np.zeros
rather thannp.empty
, withdtype=bytes
to store the state of locations to keep and discard:https://github.com/JLSteenwyk/ClipKIT/blob/a43975dbf89dc09168e1e19c8c58d5f10d88909e/clipkit/helpers.py#L109-L114
So, it looks like this:
It has to be zeros so that we get an array of
b''
, which will crucial below when joining things (if you use np.empty, you get random bytes that don't decode back to strings and can't be joined).https://github.com/JLSteenwyk/ClipKIT/blob/a43975dbf89dc09168e1e19c8c58d5f10d88909e/clipkit/modes.py#L83
https://github.com/JLSteenwyk/ClipKIT/blob/a43975dbf89dc09168e1e19c8c58d5f10d88909e/clipkit/modes.py#L96
This ensures you get a
bytes
object of length 1 back rather than anint
.https://github.com/JLSteenwyk/ClipKIT/blob/a43975dbf89dc09168e1e19c8c58d5f10d88909e/clipkit/helpers.py#L133-L136
Thanks again.