caitiecollins / treeWAS

treeWAS: A Phylogenetic Tree-Based Tool for Genome-Wide Association Studies in Microbes
Other
92 stars 18 forks source link

Testing burden score binary variants association #61

Closed DorothyTamYiLing closed 2 years ago

DorothyTamYiLing commented 2 years ago

Hi Caitie,

First of all, thanks for writing treewas, it has been really useful in my work.

I am trying to do an amino acid burden score GWAS (for any codon, retainment of reference amino acid is coded as 0 and change to any alternative amino acid is coded as 1), as also gene burden score GWAS (for any gene, code as 1 in the presence of at least one gene inactivating allele, and code as 0 otherwise). It is still binary genotype and I am wondering if this would be appropriate to use the 3 association tests implemented in treewas? The tree I used was generated using core genome SNP information.

Thanks, Dorothy

caitiecollins commented 2 years ago

Hi Dorothy,

Thanks, I’m glad to hear that.

As long as you code the genetic data as a binary variable in the way you describe, I would expect both of these analyses to work in treeWAS.

At worst, you could essentially make the association test “one-sided”, i.e., able to account for trait associations with AA=1, but unable to detect the inverse associations with AA=not_1 (if varying btw AAs). This should not give you any false positive findings, but it could reduce your power to detect genuine associations. (This would be true for GWAS in general, not treeWAS specific).

If, for example, your phenotype=1 whenever AA=1, and your phen=0 when AA=2, but if phen=1 again when AA=3, you would struggle to detect the genuine difference in trait variation between AA=1 and AA=2 (and therefore btw AA=1 and AA=not_1) if AA=3 makes up any considerable proportion of AA=not_1.

On the other hand, if a number of variants make up AA=not_1 and most lead to phen=0, while phen=1 when AA=1, treeWAS should work well without losing power and all three treeWAS scores should work as expected.

I would expect your active/inactive gene-based GWAS design to work as well, as it is conceptually similar to the gene presence/absence GWAS analyses that treeWAS has worked well in.

Good luck! Let me know how it goes.

DorothyTamYiLing commented 2 years ago

Thanks Caitie! I will have a think about the "one-sided" GWAS and let you know if I have any problem. Much appreciated, Dorothy

caitiecollins commented 2 years ago

No problem! I should note so no one reading this gets confused, that I am just using the term "one-sided" to describe why you could lose some statistical power in a GWAS analysis like this, not that this would be a formally one-sided association test or to imply any change to the GWAS procedure.