EBIvariation / opentargets-pharmgkb

Pipeline to provide evidence strings for Open Targets from PharmGKB
Apache License 2.0
1 stars 1 forks source link

EVA-3269: Adding publications, phenotype mappings, functional consequences #11

Closed apriltuesday closed 1 year ago

apriltuesday commented 1 year ago

Closes #4, closes #6, closes #7

Example evidence string from tests:

{
  "datasourceId": "pharmgkb",
  "datasourceVersion": "2023-03-23",
  "datatypeId": "clinical_annotation",
  "studyId": "1449309937",
  "evidenceLevel": "1A",
  "literature": [
    "27857962",
    "11389482"
  ],
  "variantId": "19_38499645_GGAGGAG_GGAG",
  "variantRsId": "rs121918596",
  "targetFromSourceId": "ENSG00000196218",
  "variantFunctionalConsequenceId": "inframe_deletion",
  "variantOverlappingGeneId": "ENSG00000196218",
  "genotype": "del/GAG",
  "genotypeAnnotationText": "Patients with the rs121918596 del/GAG genotype may develop malignant hyperthermia when treated with volatile anesthetics (desflurane, enflurane, halothane, isoflurane, methoxyflurane, sevoflurane) and/or succinylcholine as compared to patients with the GAG/GAG genotype. Other genetic or clinical factors may also influence the risk for malignant hyperthermia.",
  "drugText": "enflurane",
  "drugId": "http://purl.obolibrary.org/obo/CHEBI_4792",
  "pgxCategory": "Toxicity",
  "phenotypeText": "Malignant Hyperthermia",
  "phenotypeFromSourceId": "http://www.orpha.net/ORDO/Orphanet_423"
}

Note targetFromSourceId is mapped from the gene provided by PharmGKB, whereas variantOverlappingGeneId is what we get from VEP based on the variant definition.

apriltuesday commented 1 year ago

cc @ireneisdoomed @DSuveges @tskir for the example PharmGKB evidence string above... anything that should be changed just let me know (in particular field names - I've been reusing ones from ClinVar and making up some of my own with impunity).

apriltuesday commented 1 year ago

@M-casado

How does having targetFromSourceId being the one mapped by PharmGKB and variantOverlappingGeneId from VEP affect the way we would interpret targetFromSourceId in evidence strings from ClinVar?

This is an excellent point and maybe something to put on the agenda for a subsequent meeting... I don't know if we'll be able change the ClinVar field names, but if not maybe we should name them differently here so they're consistent in the way you describe. (We should also check how well these two gene IDs are aligned across PGKB, it's always possible that we don't actually need both of them...)

M-casado commented 1 year ago

@apriltuesday

This is an excellent point and maybe something to put on the agenda for a subsequent meeting... I don't know if we'll be able change the ClinVar field names, but if not maybe we should name them differently here so they're consistent in the way you describe. (We should also check how well these two gene IDs are aligned across PGKB, it's always possible that we don't actually need both of them...)

Agreed, I think maintaining the source of the keywords and finding a different name for them would make the future maintainer not pull their hair off if there is a time this gets curated.

apriltuesday commented 1 year ago

I've linked to this in the meeting minutes, will merge this PR and make a subsequent one for any naming changes we decide on.