caitiecollins / treeWAS

treeWAS: A Phylogenetic Tree-Based Tool for Genome-Wide Association Studies in Microbes
Other
92 stars 18 forks source link

Base insertion and deletion #41

Closed ywx1 closed 4 years ago

ywx1 commented 4 years ago

Hi! Can this account for insertions and deletions? The code gives an error with regards to sequences when my sequence input is of different length. Thank you! Best regards, ywx1

xavierdidelot commented 4 years ago

Yes your input could include indels, but these would have to be identified first using an alignment of the genomes.

ywx1 commented 4 years ago

Hi! Thanks for the reply. I have aligned them, but am unsure of what to do next, because if I put the fasta alignment into the code I get an error. Would it be possible to provide some direction regarding this? Sorry about this - I'm really new to bioinformatics! Thank you! Best regards, ywx1

xavierdidelot commented 4 years ago

You will need to input as genetic data a matrix containing both SNPs and indels. Cf the page below for some information on how to prepare this matrix, although it does not treat the case of indels specifically. If you are new to this is might be best to ignore indels to start with and work only with SNPs. https://github.com/caitiecollins/treeWAS/wiki/2.-Data-&-Data-Cleaning