JLSteenwyk / ClipKIT

a multiple sequence alignment-trimming algorithm for accurate phylogenomic inference
https://jlsteenwyk.com/ClipKIT/
MIT License
61 stars 4 forks source link

2023-12-05: Difference between parameters `smart-gap`, `gappy` and `kpic-gappy` #40

Closed sanyalab closed 10 months ago

sanyalab commented 10 months ago

Hello,

I wanted to know the differences in terms of output MSA and subsequent HMMs that are to be constructed, when one uses smart-gap, gappy and kpic-gappy parameters to trim.

Thanks Abhijit

JLSteenwyk commented 10 months ago

Hi Abhijit,

We generally recommend using the smart-gap mode. Additional information about the various modes is available here: ClipKIT user docs, modes.

What do you plan on doing downstream with the HMMs? Typically, trimming is not done on MSAs used to build HMMs for sequence similarity searches due to the loss of information. I have developed another tool that may interest you, which helps streamline the process of HMM searches—orthofisher.

best,

Jacob

sanyalab commented 10 months ago

Hi Jacob,

Yes! The purpose of the MSA is to build HMM's in order to survey translated transcriptomes. Thank you for providing the hint. I had a doubt before performing the trimming because essentially the context of the MSA changes. I will try the orthofischer tool out, but just out of curiosity, is it very much different from lets say running jackhmmer or psiblast, collecting candidates, developing a high-quality MSA from them, creating an HMM and then surveying?

thanks Abhijit

JLSteenwyk commented 10 months ago

Hi Abhijit,

I recommend not trimming, but I always encourage a literature review.

orthofisher is similar to jackhmmer, but helps format output files in a user-friendly format, including copy number estimation.

best,

Jacob