JLSteenwyk / ClipKIT

a multiple sequence alignment-trimming algorithm for accurate phylogenomic inference
https://jlsteenwyk.com/ClipKIT/
MIT License
61 stars 4 forks source link

Should I make consensus sequence before using Clipkit or after ClipKit #29

Closed Rohit-Satyam closed 1 year ago

Rohit-Satyam commented 1 year ago

Dear Clipkit Developer

I am interested using Clipkit before running IQTree. But I also need to generate Consensus sequences for primer designing. Should I used Clipkit trimmed alignment for Consensus Sequence generation or not?

Also, you autodetect, detects my Nucleotide sequence as protein sequence:

Determining smart-gap threshold...

-------------
| Arguments |
-------------
Input file: ../05_postOutlierDetection/halign3.aln (format: fasta)
Output file: ../05_postOutlierDetection/halign3.aln.clipkit (format: fasta)
Sequence type: Protein
Gaps threshold: 0.9986
Trimming mode: kpic-smart-gap
Create complementary output: False
Create log file: True

------------------------
| Processing Alignment |
------------------------
JLSteenwyk commented 1 year ago

Hi Rohit,

Thank you for using ClipKIT!

I would not recommend trimming the alignment before consensus sequence generation.

Also, can you please share your file here so I can investigate the autodetect error? Alternatively, please feel free to email me.

best,

Jacob

Rohit-Satyam commented 1 year ago

Thanks for your input @JLSteenwyk. Here is the file. I used Halign3 for aligning the sequences. halign3.zip

JLSteenwyk commented 1 year ago

Hi Rohit,

ClipKIT detects Amino Acids because your sequences have a lot of non-canonical characters. All unique characters include: -, A, C, G, K, M, N, R, S, T, W, Y.

Although these are acceptable nucleotide characters, in my experience, it is relatively uncommon to see all of these in an alignment file. If I am mistaken, please let me know. As a result, I recommend specifying that your alignments are of nucleotide characters.

best,

Jacob