CAST-genomics / haptools

Ancestry and haplotype aware simulation of genotypes and phenotypes for complex trait analysis
https://haptools.readthedocs.io
MIT License
18 stars 4 forks source link

support for a TR-based Haplotype class #164

Closed aryarm closed 1 year ago

aryarm commented 1 year ago

we can extend the hap file format with a new line-type for tandem repeats it should have the same fields as a haplotype line, essentially

and then we can create a new class similar to the Haplotype class but for reading tandem repeats and extend it within sim_phenotype.py to add a beta extra field we would also still need to alter the Haplotypes class to support the new line type

we can also add a --repeats optional argument to the simphenotype command to specify a VCF file containing tandem repeats

then, after we load the tandem repeats internally, we can merge the genotypes from the GenotypesTR class with the Genotypes class to create a new Genotypes class, which we can then pass to the PhenoSimulator class

aryarm commented 1 year ago

Update (after https://github.com/CAST-genomics/haptools/commit/6b3f6e7f5d0fd808087c257ef10cd990f21356ab) see https://github.com/CAST-genomics/haptools/pull/208#issuecomment-1508822504 We refactored the Haplotypes class to make it easier to add new line types in the future. We should also make sure to document that an H line can never have the same ID as an R line, but an H (or R) line can have the same ID as a V line.

aryarm commented 1 year ago

Resolved in #209 🥳