SuLab / GeneWikiCentral

GeneWiki Organization
MIT License
5 stars 2 forks source link

Load variant annotation data from Cancer Genome Interpreter #34

Open andrewsu opened 7 years ago

andrewsu commented 7 years ago

See, a CC0 licensed database. Perhaps to be aligned with the CIVIC bot? cc @andrawaag Reference:

stuppie commented 7 years ago

@andrawaag @andrewsu @sebotic @lschriml I sketched out a single example to use to discuss a few things. This is only one example and there will be other issues with other entries.... There's no way to link to a single entry, so ho here: and type "BRCA1 deletion" in the "Biomaker" box. The data from the tsv file download is below:

Alteration Alteration type Assay type Association Biomarker Curator Drug Drug family Drug full name Drug status Evidence level Gene Metastatic Tumor Type Primary Tumor acronym Source Targeting individual_mutation transcript gene strand region info cDNA gDNA Primary Tumor type
BRCA1:del CNA   Responsive BRCA1 deletion RDientsmann [] [PARP inhibitor] PARP inhibitors   Pre-clinical BRCA1   OV PMID:22392482                   Ovary

1) Sometimes a drug family is specified, not a drug. So we need to be able to normalize all of these drug families. There are 116 unique drug family strings. I think most can be matched up to ChEBI by hand (example: PARP Inhibitors). We also need to be able to handle dual/multiple inhibitors (e.g. PI3K & MEK inhibitors ).

2) I'm not sure how to handle the evidence. CGI gives an "Evidence level", which may be one of the following: Pre-clinical, Early trials, Case report, FDA guidelines, European LeukemiaNet guidelines, NCCN guidelines, Late trials, CPIC guidelines, Clinical trial. In civic, this is a little more detailed because the evidence is specifically about the claim made by the evidence, but this is information about the source itself, and so I think it makes sense to put it on "stated in"'s item.



sebotic commented 7 years ago

Had a look at the therapies. There are combo therapies in there where just one of the compounds actually targets the mutated oncogene, but it's only obvious to the expert which one it is. Maybe it can be looked up in the original ref.

Strangely, they often use drug family, but the reference only states very specific compounds.

stuppie commented 7 years ago

Strangely, they often use drug family, but the reference only states very specific compounds.

Example: Row 255, no drug listed, drug family: "BRAF inhibitor + HSP90 inhibitors", reference (PMID:22351686) specifically lists XL888 as the HSP90 inhibitor and vemurafenib as the BRAF inhibitor

sebotic commented 7 years ago

yes, extracting it manually from the referenced publication might work in many cases.

stuppie commented 7 years ago


stuppie commented 7 years ago

Alteration Type

Here I'm looking at normalization of the Alteration type in CGI. Since we have many Sequence Ontology concepts in Wikidata already and @andrawaag has done mappings from CiVIC, I started with mapping them to SO. If I couldn't find anything, I've listed a NCI methathesaurus mapping. But I think we want to stick with SO, so unless anyone else finds better mappings, I'll ask the SO people.

There are 5 listed: MUT, CNA, FUS, EXPR, BIA.

CGI Description SO SO Label Description Notes
MUT ? Are these specifically Missense Variants?
CNA deletion or amplification (copy number alteration?) ? ? Could be deletion or amplification.
FUS fusion feature_fusion A sequence variant, caused by an alteration of the genomic sequence, where a deletion fuses genomic features.
EXPR overexpression or underexpression level_of_transcript_variant A sequence variant which alters the level of a transcript
BIA biallelic inactivation Biallelic Mutation A mutation that occurs on both alleles of a single gene.
stuppie commented 7 years ago

Primary Tumor Type

Mapping of "Primary Tumor type" to DO. Again because we have DO in Wikidata, and to stay consistent with how civic variants are represented in Wikidata. For the rest:

Chronic myeloid leukemia
Lung adenocarcinoma
Gastrointestinal stromal
Thyroid carcinoma
Non-small cell lung
Cutaneous melanoma
Acute promyelocytic leukemia
Systemic mastocytosis
Colorectal adenocarcinoma
Erdheim-Chester histiocytosis
Myelodisplasic proliferative syndrome
Acute myeloid leukemia
Any cancer type
Lagerhans cell histiocytosis
Myelodisplasic syndrome
Giant cell astrocytoma
Renal angiomyolipoma
Gastroesophageal junction adenocarcinoma
Hyper eosinophilic advanced snydrome
Acute lymphoblastic leukemia
Basal cell carcinoma
Eosinophilic chronic leukemia []()
stuppie commented 7 years ago


Mapped all drugs to Wikidata QIDs

Drug Name QID Pubchem CID
4ohtestosterone Q4637157 160615
abemaciclib Q27074101 46220502
abiraterone Q321431 132971
ado-trastuzumab emtansine Q3997863
afatinib Q4688818 10184653
ag-120 Q18881245
ag-221 Q27077182 89683805
alectinib Q21099132 49806720
amiloride Q419995 16231
anastrozole Q419143 2187
ar42 Q27276699 6918848
arn-509 Q21098975 24872560
arsenic trioxide Q7739 261004
atezolizumab Q20707748
axitinib Q4830631 6450551
azd5363 Q27074756 25227436
azd6738 Q27896182 121596701
belinostat Q4882925 6918638
bendamustine Q425745 65628
bevacizumab Q413299 135329020
bicalutamide Q1988832 2375
bortezomib Q419319 387447
bosutinib Q894611 5328940
byl719 Q27074391 56649450
cabozantinib Q795057 25102847
capecitabine Q420207 60953
cediranib Q5057052 9933475
ceritinib Q21011233 57379345
cetuximab Q420296
cisplatin Q412415
cobimetinib Q15708292 16222096
crenolanib Q5184160 10366136
crizotinib Q5186964 11626560
dabrafenib Q3011604 44462760
dacomitinib Q17130597 11511120
dasatinib Q419940 3062316
daunorubicin Q411659 30323
decitabine Q1181878 451668
docetaxel Q420436 148124
dovitinib Q27077102 9886808
doxorubicin Q18936 31703
entrectinib Q25323953 25141092
entrictinib Q25323953 25141092
enzalutamide Q1996756 15951529
enzastaurin Q5381479 176167
epz-005687 Q27077206 60160561
epz-5676 Q27088395
epz-6438 Q27088941 66558664
erlotinib Q418369 176870
everolimus Q421052 6442177
exemestane Q418819 60198
flourouracil Q238512 3385
flutamide Q418669 3397
fluvestrant Q5508491 104741
foretinib Q5469311 42642645
foretinib  Q5469311 42642645
fulvestrant Q5508491 104741
gdc-0810 Q27272746 56941241
gefitinib Q417824 123631
gemcitabine Q414143 60750
gsk2141795 Q27089099 51042438
hm61713 Q27088175 54758501
ibrutinib Q5984881 24821094
ilorasertib Q27265085 46207586
imatinib Q177094 5291
irinotecan Q412197 60838
lapatinib Q420323 208908
lee011 Q27088552 44631912
lenalidomide Q425681 216326
lestaurtinib Q6531771 126565
letrozole Q194974 3902
liposomal doxorubicin Q29004943
lucitanib Q27082198 25031915
mercaptopurine Q418529 667490
midostaurin Q6842945 9829523
mitomycin c Q19856779 5746
mk-1775 Q27074716 24856436
mk2206 Q25100065 24964624
mytomycin c Q19856779 5746
neratinib Q6995920 9915743
nilotinib Q412327 644241
nimotuzumab Q3877039
nintedanib Q15149723 9809715
nivolumab Q7041828
octreotide Q27088142 383414
olaparib Q7083106 23725625
onalespib Q27287092 11955716
orterone Q6581305 9883029
osimertinib Q21506464 71496458
paclitaxel Q423762 36314
palbociclib Q15269707 5330286
panitumumab Q417775
pazopanib Q7157043 10113978
pd-0332991 Q15269707 5330286
pembrolizumab Q13896859
pertuzumab Q1998021
phenformin Q753100 8249
plx3397 Q27088306 25151352
plx4720 Q27088418 24180719
pomalidomide Q7227206 134780
ponatinib Q198728 24826799
pramlintide Q2062094 70691388
quizartinib Q7272714 24889392
regorafenib Q3891664 11167602
rociletinib Q27088606 57335384
rucaparib Q7376558 9931954
ruxolitinib Q7383611 25126798
selumetinib Q7448840 10127622
sirolimus Q32089 5284616
sorafenib Q421136 216239
sunitinib Q417542 5329102
tamoxifen Q412178 2733526
tegafur Q413370 5386
temozolomide Q425088 5394
tensirolimus Q7699074 6918289
thioguanine Q27895847 91669166
tipifarnib Q7808830 159324
trametinib Q7833138 11707110
trastuzumab Q412616
tretinoin Q29417 444795
u0126 Q7863562 5354033
vandetanib Q7914515 3081361
veliparib Q7919041 11960529
vemurafenib Q423111 42611257
venetoclax Q23671272 49846579
verapamil Q410291 2520
vinblastine Q282629 241903
vismodegib Q2070286 24776445
volasertib Q7939986
vorinostat Q905901 5311

These are listed as drugs also. but should be either a drug family or treatment: lhrh analogues or antagonist, bcl2 inhibitor, chk1/2 inhibitor, anthracyclines, platinum agent, hsp90 inhibitor, chemotherapy, mek inhibitor

stuppie commented 7 years ago

Drug Combinations The drug combinations listed where both are actual drugs (and not drug families): We have drug combination items like this:, where the drugs are "fixed dose combination drugs", meaning the drugs are combined into one product, however (I think), all of these are really two different drugs, which are given as one treatment. I think we should distinguish between these? @andrawaag

There are 39 of these, and there are currently none in Wikidata. Some may be present in other databases, but I haven't checked yet.

There are others where one or both is a drug family or treatment (chemotherapy). Not sure how to treat these:

stuppie commented 7 years ago

Evidence Levels and Sources

These evidence levels have journal articles (specified by PMID) or clinical trials (specified by NCIT) as the sources

Guidelines For these evidence levels, some have PMIDs listed, most just have "FDA" or "NCCN" listed. FDA guidelines: PMID:24670165, PMID:24327273, PMID:27283860, PMID:22417203, PMID:19726763, PMID:19726761, PMID:20065189, PMID:22025146, FDA

NCCN guidelines: PMID:21562040, PMID:26287849, PMID:20921461, PMID:24024839;PMID:20619739;PMID:23325582, NCCN, FDA

European LeukemiaNet guidelines: PMID:21562040 (only this one) CPIC guidelines: PMID:23988873 (only this one) NCCN/CAP guidelines: NCCN (only one item)

Plan of action Wait until we work out the evidence levels in Civic before tackling the non-guideline items. Will create items for FDA and NCCN guidelines, and use this as the determination method.