ersilia-os / pharmacogx-embeddings

Pharmacogenomics knowledge graph embeddings and related analyses
GNU General Public License v3.0
3 stars 0 forks source link

Check deduplication of gene - variant in the variant annotation files #13

Closed GemmaTuron closed 1 year ago

GemmaTuron commented 1 year ago

The variant annotation files did not take into account that one gene could be associated to the wrong variant due to the deduplication rationale followed (see the drug_labels.py for the best way to avoid this) - check that all are correct.

GemmaTuron commented 1 year ago

The drug_labels.csv file contained a cell with genes and a cell with associated variants/haplotypes, but each gene might not be associated with each variant, for example: CYP2C19; CYP2C9; HLA-B,CYP2C9*2; CYP2C9*3; HLA-B*15:02:01 would be: CYP2C19, , CYP2C9, CYP2C9*2; CYP2C9, CYP2C9*3 HLA-B, HLA-B*15:02:01

On the other hand, on the clinical_annotations and variant_annotation files, when a cell contains several genes, they are all associated to the reported variant, for example: rs1800497,ANKK1;DRD2, is: rs1800497,ANKK1, rs1800497,DRD2,