ersilia-os / pharmacogx-embeddings

Pharmacogenomics knowledge graph embeddings and related analyses
GNU General Public License v3.0
3 stars 0 forks source link

Map variants to haplotypes in PharmGKB #8

Closed GemmaTuron closed 1 year ago

GemmaTuron commented 1 year ago

PharmGKB oftentimes puts together variants and haplotypes in the downloaded files (See for example, clinicalAnnotations) We need to map each haplotype to the variants it contains (for example, for CYP2B6*1). We will automaticlaly download all the available allele tables per gene, curate and add them to the existing files

GemmaTuron commented 1 year ago

This commit has created tables for each gene that contain the haplotype information and the different variants in each haplotype.

miquelduranfrigola commented 1 year ago

Hello @GemmaTuron - this is great stuff, thanks. As discussed, great that you deconvolute haplotypes to variants (see related issue #15)

GemmaTuron commented 1 year ago

This is already done in the mentioned commit.

GemmaTuron commented 1 year ago

I have improved the haplotype to variant deconvolution to ensure maximum correspondence in the variant name so that we can have all the variant ID (vid) which enable us to query pharmgkb easily using the API. This is still work in progress as some variant names like the NAT1 9bp deletion do not correspond to the actual name in the API -- needs manual curation Commit

GemmaTuron commented 1 year ago

I have finished the deconvolution, all variants inside haplotyps for which a variant ID is available are now associated in the file hap_var_complete.csv changes in this commit.