Closed chtsai0105 closed 1 year ago
Hi Cheng-Hung,
Thanks for this suggestion! At first glance, it looks like a great enhancement!
I will take a close look early next week.
Thanks again!
best,
Jacob
Hi Cheng-Hung,
The latest version of ClipKIT takes inspiration from your ticket and now implements a faster version of ClipKIT.
Thank you for your suggestion!
best,
Jacob
Hi - I'm suggesting to use some numpy implementation instead of for loop for trimming.
Below is an example codes that I tried. I slightly modified the original function ClipKIT used for the trimming process but the concept should be the same. Note that I'm using Biopython MultipleSeqAlignment objects as input. I noticed you have already convert them to numpy array so I guess you can simply skip the conversion step. Also the trimD definition might be different from the original one since I just use it to store the index should be trimmed.
I tested it with jupyter %%timeit function and here is the results:
67.1 ms ± 1.67 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
And here is the rewrite with numpy methods.
And there is a significant speed up:
616 µs ± 2.73 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Since I'm only using the gappy mode to do the trimming, I'm not sure if this implementation is applicable to other trimming mode.