NCI-Thesaurus / thesaurus-obo-edition

OBO Library edition of NCIt
22 stars 8 forks source link

Ideas for exemplar queries #11

Open mbrush opened 7 years ago

mbrush commented 7 years ago

For coming up with queries, it may help to use the cmap here that shows all paths through the data. Below are a few examples of queries we can see should be supported, along with the path the query would take through top-level types in NCIt. Queries get more complex as you go down the list.

What genes are targeted by drugs used to treat cancer x? Query path: Disease or Disorder -> Chemotherapy Regimen or Agent Combination -> Drug, Food, Chemical or Biomedical Material -> Gene

What pathways are targeted by drugs used to treat cancer x? Query path: Disease or Disorder -> Gene -> Biochemical Pathway

What signaling pathways are affected in cancer x? Query path: Disease or Disorder -> Molecular Abnormality -> Gene -> Biochemical Pathway

What drugs target the highest number of pathways? Query path:Drug, Food, Chemical or Biomedical Material -> Gene -> COUNT(Biochemical Pathway)

What other drugs target genes in the same signaling pathway as drug X? Query path: Drug, Food, Chemical or Biomedical Material -> Gene Product -> Gene -> Biochemical Pathway -> Gene -> Gene Product -> Drug, Food, Chemical or Biomedical Material

A possible drug re-purposing query (based on WD example here) Find cancers that may be treated by drugs used to treat non-cancers (based on fact that the one or more of the genes known to be targeted by the drug are associated with the cancer) Query path: Disease or Disorder -> Drug, Food, Chemical or Biomedical Material -> Gene Product -> Gene -> Drug, Food, Chemical or Biomedical Material -> Disease or Disorder

A drug re-purposing query extension: Can go further and constrain results to include only matches based on shared genes annotated to apoptosis or cell proliferation GO terms. Query path: Disease or Disorder -> Drug, Food, Chemical or Biomedical Material -> Gene Product -> Gene (filter by GO annotation) -> Drug, Food, Chemical or Biomedical Material -> Disease or Disorder

krobasky commented 7 years ago

I wrote a few queries, but they don't return anything -- is that OK, or am I doing it wrong?

For example, here's the first query: http://yasgui.org/short/rJrg9w_qb

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX DiseaseClass: <http://purl.obolibrary.org/obo/NCIT_C2991>
PREFIX RegimenHasAcceptedUseForDisease: <http://purl.obolibrary.org/obo/NCIT_R172>
PREFIX ChemicalOrDrugAffectsGeneProduct: <http://purl.obolibrary.org/obo/NCIT_R146>
PREFIX ChemicalOrDrugIsMetabolizedByEnzyme: <http://purl.obolibrary.org/obo/NCIT_R122>
PREFIX GeneProductIsEncodedByGene: <http://purl.obolibrary.org/obo/NCIT_R54>
SELECT DISTINCT ?disease_labels ?gene_labels 
FROM <http://purl.obolibrary.org/obo/ncit.owl>
FROM <http://purl.obolibrary.org/obo/ncit/ncit-property-graph-redundant.ttl>
WHERE { 
  ?diseases rdfs:subClassOf* DiseaseClass: .
  ?diseases rdfs:label ?disease_labels .
  ?therapies RegimenHasAcceptedUseForDisease: ?diseases .
  ?proteins ChemicalOrDrugIsMetabolizedByEnzyme: ?therapies .
  ?genes GeneProductIsEncodedByGene: ?proteins .
  ?genes rdfs:label ?gene_labels .
}
mbrush commented 7 years ago

Hi @krobasky. Try the query below (implemented here).

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX DiseaseClass: <http://purl.obolibrary.org/obo/NCIT_C2991>
PREFIX RegimenHasAcceptedUseForDisease: <http://purl.obolibrary.org/obo/NCIT_R172>
PREFIX ChemicalOrDrugAffectsGeneProduct: <http://purl.obolibrary.org/obo/NCIT_R146>
PREFIX ChemicalOrDrugIsMetabolizedByEnzyme: <http://purl.obolibrary.org/obo/NCIT_R122>
PREFIX GeneProductIsEncodedByGene: <http://purl.obolibrary.org/obo/NCIT_R54>
PREFIX ChemotherapyRegimenHasComponent: <http://purl.obolibrary.org/obo/NCIT:R123>
SELECT DISTINCT ?disease_label ?gene_label 
FROM <http://purl.obolibrary.org/obo/ncit.owl>
FROM <http://purl.obolibrary.org/obo/ncit/ncit-property-graph.ttl>
WHERE { 

?therapy RegimenHasAcceptedUseForDisease: ?disease .
?disease rdfs:label ?disease_label .
?disease rdfs:subClassOf* DiseaseClass: .
  OPTIONAL {?therapy ChemotherapyRegimenHasComponent: ?chemical .}
  OPTIONAL {?chemical ChemicalOrDrugIsMetabolizedByEnzyme: ?protein .}
  OPTIONAL {?protein  GeneProductIsEncodedByGene: ?gene .
  ?gene rdfs:label ?gene_label . } 
}

A few things to note:

  1. Your query was missing the ChemotherapyRegimenHasComponent: edge from ?therapy to ?chemical.

  2. You need OPTIONAL clauses when there may be no match for an intermediate pattern - otherwise the query terminates when no matches found. e.g. not all ?chemicals in NCIt are linked to a ?protein via ChemicalOrDrugIsMetabolizedByEnzyme:)

  3. Stylistically, it is more common to use singular variables (?disease, not ?diseases) - but is a matter of preference I suppose.

  4. I might recommend querying the http://purl.obolibrary.org/obo/ncit/ncit-property-graph.ttl instead of http://purl.obolibrary.org/obo/ncit/ncit-property-graph-redundant.ttl. (faster performance, simpler and clearer results for demo purposes)

  5. Finally, always consider the plain-language question a query is answering (and if it is biologically interesting?). Here, it would seem to be something like "For a given disease, what genes play a role in metabolizing drugs used to treat them"? I'm not a clinician or clinical researcher, but to me this doesn't sound like a query someone is likely to ask. (It may indeed be, but I can't quite see it). Anyway, my point is make sure that the question the query answers is a relevant/interesting one.

@balhoff Does this advice fit with your plan/view of the example queries?

krobasky commented 7 years ago

This is SUPER helpful, thank you @mbrush !!

I'm a SPARQL rookie, and I wrote that query to answer your first question, at the top of this thread, namely:

What genes are targeted by drugs used to treat cancer x? Query path: Disease or Disorder -> Chemotherapy Regimen or Agent Combination -> Drug, Food, Chemical or Biomedical Material -> Gene

I think your question, as you originally asked, is interesting to an oncology researcher, but perhaps my query didn't implement it as intended. Thanks for any feedback!

mbrush commented 7 years ago

Hi @krobasky .

First, I noticed a typo in my query above - the colon in NCIT:R123 should be an underscore in PREFIX ChemotherapyRegimenHasComponent: http://purl.obolibrary.org/obo/NCIT:R123. Fixing this actually gives many more results.

Second, I take back my recommendation to use OPTIONAL clauses in this case. Here we want to get only results that completely traverse the described path from disease -> therapy -> chemical -> protein -> gene. If we use optional clauses as above, we get back results that traverse only part of this path. When we do this we get the query below, which gives 264 results that should all be valid.

http://yasgui.org/short/Bk8Hw9t9Z


PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX DiseaseClass: <http://purl.obolibrary.org/obo/NCIT_C2991>
PREFIX RegimenHasAcceptedUseForDisease: <http://purl.obolibrary.org/obo/NCIT_R172>
PREFIX ChemicalOrDrugAffectsGeneProduct: <http://purl.obolibrary.org/obo/NCIT_R146>
PREFIX ChemicalOrDrugIsMetabolizedByEnzyme: <http://purl.obolibrary.org/obo/NCIT_R122>
PREFIX GeneProductIsEncodedByGene: <http://purl.obolibrary.org/obo/NCIT_R54>
PREFIX ChemotherapyRegimenHasComponent: <http://purl.obolibrary.org/obo/NCIT_R123>
SELECT DISTINCT ?disease_label ?gene_label 
FROM <http://purl.obolibrary.org/obo/ncit.owl>
FROM <http://purl.obolibrary.org/obo/ncit/ncit-property-graph.ttl>
WHERE { 

?therapy RegimenHasAcceptedUseForDisease: ?disease .
?disease rdfs:label ?disease_label .
?disease rdfs:subClassOf* DiseaseClass: .
?therapy ChemotherapyRegimenHasComponent: ?chemical .
?chemical ChemicalOrDrugIsMetabolizedByEnzyme: ?protein .
?protein  GeneProductIsEncodedByGene: ?gene .
 ?gene rdfs:label ?gene_label .
}

Make sure you understand why this is - it teaches an important lesson for SPARQLing that will help you avoid getting erroneous results in the future. (e.g. here we get >100,000 results if we include the OPTIONAL clauses, vs 264 without them - and 264 is the correct number.)

mbrush commented 7 years ago

Third, note that if you include the ?chemical in your WHERE clause, you get back more results (358 instead of 264) - because there may be more than one path from disease to gene and when you ask the query to return the chemical in this path, you generate more 'distinct' results. Make sure you understanding why this is as well - it illustrates another important less for writing sparql queries.

http://yasgui.org/short/B1fxFqY9W

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX DiseaseClass: <http://purl.obolibrary.org/obo/NCIT_C2991>
PREFIX RegimenHasAcceptedUseForDisease: <http://purl.obolibrary.org/obo/NCIT_R172>
PREFIX ChemicalOrDrugAffectsGeneProduct: <http://purl.obolibrary.org/obo/NCIT_R146>
PREFIX ChemicalOrDrugIsMetabolizedByEnzyme: <http://purl.obolibrary.org/obo/NCIT_R122>
PREFIX GeneProductIsEncodedByGene: <http://purl.obolibrary.org/obo/NCIT_R54>
PREFIX ChemotherapyRegimenHasComponent: <http://purl.obolibrary.org/obo/NCIT_R123>
SELECT DISTINCT ?disease_label  ?chemical_label ?gene_label 
FROM <http://purl.obolibrary.org/obo/ncit.owl>
FROM <http://purl.obolibrary.org/obo/ncit/ncit-property-graph.ttl>
WHERE { 

?therapy RegimenHasAcceptedUseForDisease: ?disease .
?disease rdfs:label ?disease_label .
?disease rdfs:subClassOf* DiseaseClass: .
?therapy ChemotherapyRegimenHasComponent: ?chemical .
  ?chemical rdfs:label ?chemical_label .
?chemical ChemicalOrDrugIsMetabolizedByEnzyme: ?protein .
?protein  GeneProductIsEncodedByGene: ?gene .
 ?gene rdfs:label ?gene_label .
}
mbrush commented 7 years ago

Finally, to match the first query suggested query above, you would want to use the Chemical_Or_Drug_Affects_Gene_Product relation between the drug and gene product, which describes the therapeutic target of the drug (rather than Chemical_Or_Drug_Is_Metabolized_By_Enzyme which describes the enzyme that metabolizes the drug). I think is a much more interesting question to an oncologist.

http://yasgui.org/short/HJ-iFcF5b


PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX DiseaseClass: <http://purl.obolibrary.org/obo/NCIT_C2991>
PREFIX RegimenHasAcceptedUseForDisease: <http://purl.obolibrary.org/obo/NCIT_R172>
PREFIX ChemicalOrDrugAffectsGeneProduct: <http://purl.obolibrary.org/obo/NCIT_R146>
PREFIX ChemicalOrDrugIsMetabolizedByEnzyme: <http://purl.obolibrary.org/obo/NCIT_R122>
PREFIX GeneProductIsEncodedByGene: <http://purl.obolibrary.org/obo/NCIT_R54>
PREFIX ChemotherapyRegimenHasComponent: <http://purl.obolibrary.org/obo/NCIT_R123>
SELECT DISTINCT ?disease_label  ?chemical_label ?protein_label 
FROM <http://purl.obolibrary.org/obo/ncit.owl>
FROM <http://purl.obolibrary.org/obo/ncit/ncit-property-graph.ttl>
WHERE { 

?therapy RegimenHasAcceptedUseForDisease: ?disease .
?disease rdfs:label ?disease_label .
?disease rdfs:subClassOf* DiseaseClass: .
?therapy ChemotherapyRegimenHasComponent: ?chemical .
?chemical rdfs:label ?chemical_label .
?chemical ChemicalOrDrugAffectsGeneProduct: ?protein .
?protein rdfs:label ?protein_label .

}

Returns 80 results.

mbrush commented 7 years ago

To drive home point about adding additional variables to the WHERE clause - the query below is identical to the one above but asks for ?therapy to be returned as well. This gives 179 results instead of 80 - because there are cases where the same chemical is a component of many therapies (so asking to return the therapy as well results in extra 'distinct' rows that now get returned).

http://yasgui.org/short/Sy5shcFqZ

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX DiseaseClass: <http://purl.obolibrary.org/obo/NCIT_C2991>
PREFIX RegimenHasAcceptedUseForDisease: <http://purl.obolibrary.org/obo/NCIT_R172>
PREFIX ChemicalOrDrugAffectsGeneProduct: <http://purl.obolibrary.org/obo/NCIT_R146>
PREFIX ChemicalOrDrugIsMetabolizedByEnzyme: <http://purl.obolibrary.org/obo/NCIT_R122>
PREFIX GeneProductIsEncodedByGene: <http://purl.obolibrary.org/obo/NCIT_R54>
PREFIX ChemotherapyRegimenHasComponent: <http://purl.obolibrary.org/obo/NCIT_R123>
SELECT DISTINCT ?disease_label ?therapy_label ?chemical_label ?protein_label 
FROM <http://purl.obolibrary.org/obo/ncit.owl>
FROM <http://purl.obolibrary.org/obo/ncit/ncit-property-graph.ttl>
WHERE { 

?therapy RegimenHasAcceptedUseForDisease: ?disease .
?disease rdfs:label ?disease_label .
?disease rdfs:subClassOf* DiseaseClass: .
?therapy ChemotherapyRegimenHasComponent: ?chemical .
?therapy rdfs:label ?therapy_label .
?chemical rdfs:label ?chemical_label .
?chemical ChemicalOrDrugAffectsGeneProduct: ?protein .
?protein rdfs:label ?protein_label .

}

Sorry if this is superfluous - these are just nice examples for teaching a couple key lessons I have learned in my (limited) time doing sparql queries, that would behoove anyone starting out to deeply understand.

balhoff commented 7 years ago

Thanks @mbrush!

decorons commented 7 years ago

How about a query like All the Diseases that 'may have molecular abnormality' RB1 gene inactivation? That shows a long list in the regular NCIt browser.

krobasky commented 7 years ago

Are there any other genes that might be of interest? Say, TP53, KRAS,…? Would a query to list all gene-disease connections be useful, do you think? Or perhaps both?

From: decorons [mailto:notifications@github.com] Sent: Monday, September 25, 2017 1:57 PM To: NCI-Thesaurus/thesaurus-obo-edition thesaurus-obo-edition@noreply.github.com Cc: krobasky krobasky@gmail.com; Mention mention@noreply.github.com Subject: Re: [NCI-Thesaurus/thesaurus-obo-edition] Ideas for exemplar queries (#11)

How about a query like All the Diseases that 'may have molecular abnormality' RB1 gene inactivation? That shows a long list in the regular NCIt browser.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/NCI-Thesaurus/thesaurus-obo-edition/issues/11#issuecomment-331962119, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAZN7r7iovZ7RDM8iLZbRIKNcElYGtywks5sl-l2gaJpZM4PRks7.

decorons commented 7 years ago

I think you’d get a very large list back if you query all the gene disease connections, but yes. It might be an interesting question for some researchers. Especially if you could also query the ‘may have molecular abnormality’ (class of genes they code for – e.g. all the ones that are Transcription Regulation Genes. Gilberto might have other thoughts. It gets my brain a little twisted. My initial suggestion was for ‘RB1 gene inactivation’ not ‘RB1 gene’. The child concepts of these categories are genes. So it would take a couple of queries to traverse…

Gene<javascript:onClickTreeNode('C16612','NCI_Thesaurus');>

Transcription Regulation Gene<javascript:onClickTreeNode('C54362','NCI_Thesaurus');>

Fusion Gene<javascript:onClickTreeNode('C28510','NCI_Thesaurus');>

Apoptosis Regulation Gene<javascript:onClickTreeNode('C20462','NCI_Thesaurus');>

Replication Initiation Gene<javascript:onClickTreeNode('C20370','NCI_Thesaurus');>

Telomere Maintenance Gene<javascript:onClickTreeNode('C20338','NCI_Thesaurus');>

...<javascript:onClickTreeNode('C16612_dot_C54362_1','null');>

From: krobasky notifications@github.com Reply-To: NCI-Thesaurus/thesaurus-obo-edition reply@reply.github.com Date: Monday, September 25, 2017 at 2:05 PM To: NCI-Thesaurus/thesaurus-obo-edition thesaurus-obo-edition@noreply.github.com Cc: NIH Mailserver decorons@mail.nih.gov, Comment comment@noreply.github.com Subject: Re: [NCI-Thesaurus/thesaurus-obo-edition] Ideas for exemplar queries (#11)

Are there any other genes that might be of interest? Say, TP53, KRAS,…? Would a query to list all gene-disease connections be useful, do you think? Or perhaps both?

From: decorons [mailto:notifications@github.com] Sent: Monday, September 25, 2017 1:57 PM To: NCI-Thesaurus/thesaurus-obo-edition thesaurus-obo-edition@noreply.github.com Cc: krobasky krobasky@gmail.com; Mention mention@noreply.github.com Subject: Re: [NCI-Thesaurus/thesaurus-obo-edition] Ideas for exemplar queries (#11)

How about a query like All the Diseases that 'may have molecular abnormality' RB1 gene inactivation? That shows a long list in the regular NCIt browser.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/NCI-Thesaurus/thesaurus-obo-edition/issues/11#issuecomment-331962119, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAZN7r7iovZ7RDM8iLZbRIKNcElYGtywks5sl-l2gaJpZM4PRks7.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/NCI-Thesaurus/thesaurus-obo-edition/issues/11#issuecomment-331964462, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AGuY8MWKT6EDvtsYMfCSCvJY0kNxKoVUks5sl-tlgaJpZM4PRks7.

krobasky commented 7 years ago

@mbrush thanks so much for taking time to explain the finer points.

Another question, regarding this query:

http://yasgui.org/short/HJ-iFcF5b

PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema# PREFIX DiseaseClass: http://purl.obolibrary.org/obo/NCIT_C2991 PREFIX RegimenHasAcceptedUseForDisease: http://purl.obolibrary.org/obo/NCIT_R172 PREFIX ChemicalOrDrugAffectsGeneProduct: http://purl.obolibrary.org/obo/NCIT_R146 PREFIX ChemicalOrDrugIsMetabolizedByEnzyme: http://purl.obolibrary.org/obo/NCIT_R122 PREFIX GeneProductIsEncodedByGene: http://purl.obolibrary.org/obo/NCIT_R54 PREFIX ChemotherapyRegimenHasComponent: http://purl.obolibrary.org/obo/NCIT_R123 SELECT DISTINCT ?disease_label ?chemical_label ?protein_label FROM http://purl.obolibrary.org/obo/ncit.owl FROM http://purl.obolibrary.org/obo/ncit/ncit-property-graph.ttl WHERE {

?therapy RegimenHasAcceptedUseForDisease: ?disease . ?disease rdfs:label ?disease_label . ?disease rdfs:subClassOf* DiseaseClass: . ?therapy ChemotherapyRegimenHasComponent: ?chemical . ?chemical rdfs:label ?chemical_label . ?chemical ChemicalOrDrugAffectsGeneProduct: ?protein . ?protein rdfs:label ?protein_label .

}

Returns 80 results.

How would I write it if I wanted to include results from:

?chemical ChemicalOrDrugAffectsGeneProduct: ?protein .

OR

?chemical ChemicalOrDrugIsMetabolizedEnzyme: ?protein .

? I tried using OPTIONAL (see the following query) and I expected to get more results but still only get 80, I guess, because "AffectsGeneProduct" is the limiting step. However, if I also make "AffectsGeneProduct" optional, I guess I'll get things that have no gene target at all. Thoughts?

PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema# PREFIX DiseaseClass: http://purl.obolibrary.org/obo/NCIT_C2991 PREFIX RegimenHasAcceptedUseForDisease: http://purl.obolibrary.org/obo/NCIT_R172 PREFIX ChemicalOrDrugAffectsGeneProduct: http://purl.obolibrary.org/obo/NCIT_R146 PREFIX ChemicalOrDrugIsMetabolizedByEnzyme: http://purl.obolibrary.org/obo/NCIT_R122 PREFIX GeneProductIsEncodedByGene: http://purl.obolibrary.org/obo/NCIT_R54 PREFIX ChemotherapyRegimenHasComponent: http://purl.obolibrary.org/obo/NCIT_R123 SELECT DISTINCT ?disease_label ?chemical_label ?protein_label FROM http://purl.obolibrary.org/obo/ncit.owl FROM http://purl.obolibrary.org/obo/ncit/ncit-property-graph.ttl WHERE {

?therapy RegimenHasAcceptedUseForDisease: ?disease . ?disease rdfs:label ?disease_label . ?disease rdfs:subClassOf* DiseaseClass: . ?therapy ChemotherapyRegimenHasComponent: ?chemical . ?chemical rdfs:label ?chemical_label . ?chemical ChemicalOrDrugAffectsGeneProduct: ?protein . OPTIONAL {?chemical ChemicalOrDrugIsMetabolizedByEnzyme: ?protein .} ?protein rdfs:label ?protein_label . }

balhoff commented 7 years ago

@krobasky for that you have a few options. For alternatives encompassing multiple triple patterns, you would need to use the UNION keyword. But for a single alternative property, you can write it like this (note the pipe |):

?chemical ChemicalOrDrugAffectsGeneProduct:|ChemicalOrDrugIsMetabolizedEnzyme: ?protein .

Another way to do it is use use a variable for the property and provide a VALUES block which states the possible values for that variable.

krobasky commented 7 years ago

Sorry to keep "querying" you ... pun intended...

Have you any tips on how to get results on pathway information? Or do I just have another syntax error? http://yasgui.org/short/r1wL_Hdj-

PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema# PREFIX DiseaseClass: http://purl.obolibrary.org/obo/NCIT_C2991 PREFIX DiseaseMappedToGene: http://purl.obolibrary.org/obo/NCIT_R176 PREFIX GeneIsElementInPathway: http://purl.obolibrary.org/obo/NCIT_R130 SELECT DISTINCT ?disease_label ?pathway_label FROM http://purl.obolibrary.org/obo/ncit.owl FROM http://purl.obolibrary.org/obo/ncit/ncit-property-graph.ttl WHERE { ?disease rdfs:subClassOf* DiseaseClass: . ?disease rdfs:label ?disease_label . ?disease DiseaseMappedToGene: ?gene . ?gene GeneIsElementInPathway: ?pathway . ?pathway rdfs:label ?pathway_label . }

mbrush commented 7 years ago

@krobasky It can help to have the NCIt open in Protege and explore the usage of the properties in your query. If you do this you will see that DiseaseMappedToGene connects a Disease to a named Gene class (e.g. 'BRCA2 gene'). But GeneIsElementInPathway connects an allele of a gene (e.g. 'BRCA2 wt allele') to a pathway. You can see in the NCIt that wt allele classes are children of the parent gene class. So adding a pattern traversing this subclass edge gets things to work. See http://yasgui.org/short/B1IGo8Oo-.

SELECT DISTINCT ?disease_label ?pathway_label
FROM <http://purl.obolibrary.org/obo/ncit.owl>
FROM <http://purl.obolibrary.org/obo/ncit/ncit-property-graph.ttl>
WHERE { 
  ?disease rdfs:subClassOf* DiseaseClass: .
  ?disease rdfs:label ?disease_label .
  ?disease DiseaseMappedToGene: ?gene .
  ?allele rdfs:subClassOf* ?gene .
  ?allele GeneIsElementInPathway: ?pathway .
  ?pathway rdfs:label ?pathway_label .