Open andrewsu opened 7 years ago
@andrawaag @andrewsu @sebotic @lschriml I sketched out a single example to use to discuss a few things. This is only one example and there will be other issues with other entries.... There's no way to link to a single entry, so ho here: https://www.cancergenomeinterpreter.org/biomarkers and type "BRCA1 deletion" in the "Biomaker" box. The data from the tsv file download is below:
Alteration | Alteration type | Assay type | Association | Biomarker | Curator | Drug | Drug family | Drug full name | Drug status | Evidence level | Gene | Metastatic Tumor Type | Primary Tumor acronym | Source | Targeting | individual_mutation | transcript | gene | strand | region | info | cDNA | gDNA | Primary Tumor type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
BRCA1:del | CNA | Responsive | BRCA1 deletion | RDientsmann | [] | [PARP inhibitor] | PARP inhibitors | Pre-clinical | BRCA1 | OV | PMID:22392482 | Ovary |
1) Sometimes a drug family is specified, not a drug. So we need to be able to normalize all of these drug families. There are 116 unique drug family strings. I think most can be matched up to ChEBI by hand (example: PARP Inhibitors). We also need to be able to handle dual/multiple inhibitors (e.g. PI3K & MEK inhibitors ).
2) I'm not sure how to handle the evidence. CGI gives an "Evidence level", which may be one of the following: Pre-clinical, Early trials, Case report, FDA guidelines, European LeukemiaNet guidelines, NCCN guidelines, Late trials, CPIC guidelines, Clinical trial. In civic, this is a little more detailed because the evidence is specifically about the claim made by the evidence, but this is information about the source itself, and so I think it makes sense to put it on "stated in"'s item.
Thoughts?
Had a look at the therapies. There are combo therapies in there where just one of the compounds actually targets the mutated oncogene, but it's only obvious to the expert which one it is. Maybe it can be looked up in the original ref.
Strangely, they often use drug family, but the reference only states very specific compounds.
Strangely, they often use drug family, but the reference only states very specific compounds.
Example: Row 255, no drug listed, drug family: "BRAF inhibitor + HSP90 inhibitors", reference (PMID:22351686) specifically lists XL888 as the HSP90 inhibitor and vemurafenib as the BRAF inhibitor
yes, extracting it manually from the referenced publication might work in many cases.
Notes:
Alteration Type
Here I'm looking at normalization of the Alteration type in CGI. Since we have many Sequence Ontology concepts in Wikidata already and @andrawaag has done mappings from CiVIC, I started with mapping them to SO. If I couldn't find anything, I've listed a NCI methathesaurus mapping. But I think we want to stick with SO, so unless anyone else finds better mappings, I'll ask the SO people.
There are 5 listed: MUT, CNA, FUS, EXPR, BIA.
CGI | Description | SO | SO Label | Description | Notes |
---|---|---|---|---|---|
MUT | ? | Are these specifically Missense Variants? | |||
CNA | deletion or amplification (copy number alteration?) | ? | ? | Could be deletion or amplification. | https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/407 |
FUS | fusion | http://purl.obolibrary.org/obo/SO_0001882 | feature_fusion | A sequence variant, caused by an alteration of the genomic sequence, where a deletion fuses genomic features. | |
EXPR | overexpression or underexpression | http://purl.obolibrary.org/obo/SO_0001540 | level_of_transcript_variant | A sequence variant which alters the level of a transcript | |
BIA | biallelic inactivation | http://purl.obolibrary.org/obo/NCIT_C129829 | Biallelic Mutation | A mutation that occurs on both alleles of a single gene. |
Primary Tumor Type
Mapping of "Primary Tumor type" to DO. Again because we have DO in Wikidata, and to stay consistent with how civic variants are represented in Wikidata. For the rest: https://github.com/DiseaseOntology/HumanDiseaseOntology/issues/374
Drugs
Mapped all drugs to Wikidata QIDs
These are listed as drugs also. but should be either a drug family or treatment: lhrh analogues or antagonist, bcl2 inhibitor, chk1/2 inhibitor, anthracyclines, platinum agent, hsp90 inhibitor, chemotherapy, mek inhibitor
Drug Combinations The drug combinations listed where both are actual drugs (and not drug families): We have drug combination items like this: https://www.wikidata.org/wiki/Q3836750, where the drugs are "fixed dose combination drugs", meaning the drugs are combined into one product, however (I think), all of these are really two different drugs, which are given as one treatment. I think we should distinguish between these? @andrawaag
There are 39 of these, and there are currently none in Wikidata. Some may be present in other databases, but I haven't checked yet.
There are others where one or both is a drug family or treatment (chemotherapy). Not sure how to treat these:
Evidence Levels and Sources
These evidence levels have journal articles (specified by PMID) or clinical trials (specified by NCIT) as the sources
Guidelines For these evidence levels, some have PMIDs listed, most just have "FDA" or "NCCN" listed. FDA guidelines: PMID:24670165, PMID:24327273, PMID:27283860, PMID:22417203, PMID:19726763, PMID:19726761, PMID:20065189, PMID:22025146, FDA https://www.fda.gov/drugs/scienceresearch/researchareas/pharmacogenetics/ucm083378.htm
NCCN guidelines: PMID:21562040, PMID:26287849, PMID:20921461, PMID:24024839;PMID:20619739;PMID:23325582, NCCN, FDA https://www.nccn.org/professionals/physician_gls/f_guidelines.asp
European LeukemiaNet guidelines: PMID:21562040 (only this one) CPIC guidelines: PMID:23988873 (only this one) NCCN/CAP guidelines: NCCN (only one item)
Plan of action Wait until we work out the evidence levels in Civic before tackling the non-guideline items. Will create items for FDA and NCCN guidelines, and use this as the determination method.
See https://www.cancergenomeinterpreter.org/biomarkers, a CC0 licensed database. Perhaps to be aligned with the CIVIC bot? cc @andrawaag Reference: http://www.biorxiv.org/content/early/2017/05/20/140475