Closed chtsai0105 closed 1 year ago
Hi Cheng-Hung,
Thank you for your careful reading of ClipKIT. Attentive users like yourself help ensure ClipKIT performs well (and, in some cases, even better!).
Regarding this, I will refer to the original manuscript — Steenwyk et al. (2020), Current Biology. Specifically, we note in the pseudocode for the execution clipkit <MSA> -g 0.95
that ClipKIT will do the following:
FOR site in alignment:
>IF site has <95% gaps
>>keep the site
Thus, sites with greater than or equal to the specified gaps threshold should be removed.
The code in commit 7c0dee8 was erroneous. This error stemmed from incorrectly implementing the Numpy acceleration. (Note, 7c0dee8
does implement the Numpy acceleration.)
As you mentioned in your comment, the error was later fixed, including in version 2.1.1.
Thank you again for your careful review of our code. We appreciate your insights.
best,
Jacob
Thank you! I'll update my tests file to reflect the changes.
Bug Summary
Not a bug actually. But I found the trimming behavior is different from the previous version.
In the previous version, using gappy or smart_gap mode trims sites that the gappyness above a threshold. But in the latest version it trims sites equal or above the threshold. Here are the codes I found before implementing the numpy acceleration: https://github.com/JLSteenwyk/ClipKIT/blob/7c0dee803c8fbf33edd1f018a56527b1da7ac268/clipkit/msa.py#L63-L64
However, in the latest version: https://github.com/JLSteenwyk/ClipKIT/blob/f4cb92a2c2248e21e3ecf161c1a1cd08152d4b16/clipkit/msa.py#L158-L159
Just want to confirm you're expecting this behavioral changes, since it will produce a slightly different results from the previous version.
Steps to Reproduce
The previous codes do not trim any of the sites and will get:
The current codes will trim the sites that gappyness == 0.5 and get:
Technical Details