PatrickKueck / FASconCAT-G

FASconCAT-G offers a wide range of possibilities to edit and concatenate multiple nucleotide, amino acid, and structure sequence alignment files for phylogenetic and population genetic purposes. The main options include sequence renaming, file format conversion, sequence translation, consensus generation of predefined sequence blocks, and RY coding as well as site exclusions in nucleotide sequences. FASconCAT-G implemented process options can be invoked in any combination and performed during a single process run. FASconCAT-G can also read in and handle different file formats (FASTA, CLUSTAL, and PHYLIP) in a single run.
32 stars 20 forks source link

Missing sequences replaced with X rather than - #4

Closed ERRMoody closed 3 years ago

ERRMoody commented 3 years ago

Hi, is there a way to set this program to infill missing sequences with gap characters rather than X? X implies an amino acid is present but unknown (specifically not a deletion for protein data), but if the gene has been lost then X would be incorrect, as I understand it.

Thanks, Ed

PatrickKueck commented 3 years ago

Hi Ed,

 

thank you for your recommendation. I implemented a new option (-g) in FCC-G, enabling you to use gap coding to fill up all your absent gene-sequence data during the concatenation process. The new version is FASconCAT-G_v1.05.pl as well as the adapted manual can be downloaded from github:

 

https://github.com/PatrickKueck/FASconCAT-G

 

I tested the new version on a smaller set of gene files. So please, let me know if you get some unexpected results, but usually the script should work as expected.

 

Best

 

Pat

 

 


Dr. Patrick Kück Algorithmic Development Zoological Research Museum Alexander Koenig (ZFMK) Adenauerallee 160, 53113 Bonn, Germany www.zfmk.de

   

Gesendet: Freitag, 19. März 2021 um 10:24 Uhr Von: "Edmund R. R. Moody" @.> An: "PatrickKueck/FASconCAT-G" @.> Cc: "Subscribed" @.***> Betreff: [PatrickKueck/FASconCAT-G] Missing sequences replaced with X rather than - (#4)

 

Hi, is there a way to set this program to infill missing sequences with gap characters rather than X? X implies an amino acid is present but missing, but if the gene has been lost then it's incorrect, as I understand it.

Thanks, Ed

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

ERRMoody commented 3 years ago

Great, thanks!