EBIvariation / opentargets-pharmgkb

Pipeline to provide evidence strings for Open Targets from PharmGKB
Apache License 2.0
1 stars 1 forks source link

Handle multiple alt alleles #18

Closed apriltuesday closed 11 months ago

apriltuesday commented 1 year ago

This is about how to appropriately handle rsIDs with more than one alternate allele. See #5 for some context.

PharmGKB provides variant-level clinical annotations with additional phenotype descriptions provided per genotype - for example this annotation for rs10420097. Open Targets uses VCF-style chr_pos_ref_alt variant identifiers that are allele-specific.

Currently the code explodes records by genotype but only generates one variant identifier for the RS (using the 1st alternate allele lexicographically), which doesn't seem appropriate.

We should discuss with Open Targets how they'd like the information presented and what is feasible to generate from the data, and update the implementation accordingly.

apriltuesday commented 1 year ago

Proposal from 29 Aug: replace variantID with genotypeID:

apriltuesday commented 1 year ago

We forgot to talk about VEP, which of course predicts consequences for variants not genotypes... I guess we might need lists of consequences associated with a genotype? Might need to think about this more.