Open vanaukenk opened 1 year ago
"Internal" endpoint including dev models: http://rdf-internal.berkeleybop.io/
Here is a query that gives 440 edges. It takes about 40 seconds (so it almost times out with our current settings):
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX bds: <http://www.bigdata.com/rdf/search#>
PREFIX part_of: <http://purl.obolibrary.org/obo/BFO_0000050>
PREFIX BP: <http://purl.obolibrary.org/obo/GO_0008150>
PREFIX MF: <http://purl.obolibrary.org/obo/GO_0003674>
PREFIX CC: <http://purl.obolibrary.org/obo/GO_0005575>
SELECT DISTINCT ?cam (STR(?mf_label) AS ?mf_name) (STR(?rel_label) AS ?rel_name) (STR(?bp_label) AS ?bp_name)
WHERE {
?bp rdf:type BP: .
?mf rdf:type MF: .
?rel a owl:ObjectProperty .
# This pattern selects asserted GO-CAM graphs
?cam <http://geneontology.org/lego/modelstate> ?state .
GRAPH ?cam {
?mf ?rel ?bp .
}
GRAPH ?cam {
?mf rdf:type ?asserted_mf_type .
FILTER(?asserted_mf_type != MF:)
FILTER(?asserted_mf_type != owl:NamedIndividual)
}
GRAPH ?cam {
?bp rdf:type ?asserted_bp_type .
FILTER(?asserted_bp_type != owl:NamedIndividual)
}
?asserted_mf_type rdfs:subClassOf MF: .
?asserted_bp_type rdfs:subClassOf BP: .
?asserted_mf_type rdfs:label ?mf_label .
?asserted_bp_type rdfs:label ?bp_label .
?rel rdfs:label ?rel_label .
FILTER(?rel != part_of:)
FILTER(isIRI(?mf))
FILTER(isIRI(?bp))
}
#LIMIT 1000
Thank you so much @balhoff This should keep us busy for a while :-)
@balhoff some models I dont find in Noctua - for example http://model.geneontology.org/5966411600000233
http://model.geneontology.org/598826eb00000261
How come ?
@pgaudet that's a very good question! I checked the first one to see if it was marked "delete", but it doesn't seem to be.
ok. Weird
@pgaudet that model is here: https://github.com/geneontology/noctua-models/blob/master/models/5966411600000233.ttl
It's marked "delete" there, so it isn't loaded into Noctua. But that change happened 7 months ago: https://github.com/geneontology/noctua-models/commit/c1415f553f8767d9d35a3ea90d26384dfecba4a2#diff-87d08b64703ef6e9e9b656c1d6ad4d34e7aa4997ab3f2d6a4232d330ab256c9d
@kltm could the triplestore be very out of date?
@balhoff The best way to check would be to look for newer models or annotation dates. If the "internal" endpoint, I wonder if "deleted" models are actually filtered?
@kltm that model says "development" in the endpoint: https://api.triplydb.com/s/RuwQG_fGT
Sounds like this will have to be re-ran. Next time @balhoff can you also query the model status? (many of the other ones I checked were dev)
Thank you!
@kltm I think the "internal" endpoint has a content problem. Compare these queries:
@balhoff Yes, at the very least, "prod" should be a subset of "internal".
internal:
-rw-r--r-- 1 ubuntu ubuntu 36464689152 Apr 3 20:49 blazegraph.jnl
production:
-rw-r--r-- 1 ubuntu ubuntu 36464689152 Apr 3 21:05 blazegraph.jnl
So at least we know they are getting updated. IIRC, blazegraph creates journals in sized chunks, so the identical size may be indicative of nothing.
I've restarted the internal server and nothing seems amiss otherwise. No change.
This would maybe point to an issue in the construction of the "internal" blazegraph journal?
@balhoff @dustine32 Looking through the Jenkinsfile, it seems that the journals are a product of the Mega-make:
target/blazegraph.jnl: $(BGJAR) target/rdf target/noctua-models
du -sh target
du -sh target/*
free -h
ls -AlF /tmp
JAVA_OPTS=-Xmx$(BGMEM) blazegraph-runner --journal=target/blazegraph.jnl
--properties=conf/blazegraph.properties load --use-ontology-graph $(LOAD_TARGET
S)
du -sh target
du -sh target/*
free -h
ls -AlF /tmp
JAVA_OPTS=-Xmx$(BGMEM) blazegraph-runner --journal=target/blazegraph.jnl
--properties=conf/blazegraph.properties reason --source-graphs-query=$(CAM_GRAP
H_QUERY) --ontology=$(GO_GRAPHSTORE_URI) --append-graph-name="_inferred"
du -sh target
du -sh target/*
free -h
target/blazegraph-internal.jnl: target/blazegraph.jnl
cp $< $@
JAVA_OPTS=-Xmx$(BGMEM) blazegraph-runner --journal=$@ update sparql/insert/insert_noctua_metadata.sparql
JAVA_OPTS=-Xmx$(BGMEM) blazegraph-runner --journal=$@ update sparql/insert/insert_ontology_metadata.sparql
JAVA_OPTS=-Xmx$(BGMEM) blazegraph-runner --journal=$@ update sparql/insert/insert_reflexive_subclass_closure.sparql
target/blazegraph-production.jnl: target/blazegraph-internal.jnl
cp $< $@
JAVA_OPTS=-Xmx$(BGMEM) blazegraph-runner --journal=$@ update sparql/delete/delete_non_production.sparql
It looks like there is a "source" blazegraph, that is used for the "internal" blazegraph, which is used for the "prod" blazegraph. That is not consistent with what we're seeing.
While this is suuuper worrying, it's not really in public right now, so we're going to probably not get to the bottom of this before the meeting. After the meeting, @kltm and @dustine32 will manually run the process in the Makefile on a v/v/small noctua-model set and see if we can simulate the problem. We're guessing some kind of build issues at this point because of the tmpfs and use of docker--no dirty workspaces.
Google doc with query results: https://docs.google.com/spreadsheets/d/1JS6IwCoQjpwsON1OUc0bVpb2D2jwGEj846LpXRp40eU/edit#gid=0
Query that includes model state and title (added as a new sheet to Google spreadsheet above):
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX cd: <http://citydata.wu.ac.at/ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX bds: <http://www.bigdata.com/rdf/search#>
PREFIX part_of: <http://purl.obolibrary.org/obo/BFO_0000050>
PREFIX BP: <http://purl.obolibrary.org/obo/GO_0008150>
PREFIX MF: <http://purl.obolibrary.org/obo/GO_0003674>
PREFIX CC: <http://purl.obolibrary.org/obo/GO_0005575>
SELECT DISTINCT ?cam (STR(?mf_label) AS ?mf_name) (STR(?rel_label) AS ?rel_name) (STR(?bp_label) AS ?bp_name) ?state ?title
WHERE {
?bp rdf:type BP: .
?mf rdf:type MF: .
?rel a owl:ObjectProperty .
# This pattern selects asserted GO-CAM graphs
?cam <http://geneontology.org/lego/modelstate> ?state .
?cam dc:title ?title .
GRAPH ?cam {
?mf ?rel ?bp .
}
GRAPH ?cam {
?mf rdf:type ?asserted_mf_type .
FILTER(?asserted_mf_type != MF:)
FILTER(?asserted_mf_type != owl:NamedIndividual)
}
GRAPH ?cam {
?bp rdf:type ?asserted_bp_type .
FILTER(?asserted_bp_type != owl:NamedIndividual)
}
?asserted_mf_type rdfs:subClassOf MF: .
?asserted_bp_type rdfs:subClassOf BP: .
?asserted_mf_type rdfs:label ?mf_label .
?asserted_bp_type rdfs:label ?bp_label .
?rel rdfs:label ?rel_label .
FILTER(?rel != part_of:)
FILTER(isIRI(?mf))
FILTER(isIRI(?bp))
}
#LIMIT 1000
@balhoff Once this release is finalized (today or tomorrow), @pgaudet was wondering if you could re-run to get the latest version on prod.
@vanaukenk , @balhoff figured out it was the DNS and I believe we have it corrected now.
We are working on annotation documentation for MF-to-BP relations and would like to assess the extent to which relations, other than 'part of', have been used to link MFs to BPs in Noctua.
We would like to exclude the whole genome import models, since we know that these models used 'causally upstream of or within' (or perhaps a child) and we're not concerned with those right now.
Here are two possible queries we can think of to try to find use of these relations (i.e. not 'part of') in other models:
Using existing definition of GO-CAM (three successive MFs linked by two causal relations), check to see if any of those models also include an MF to BP with a causally upstream of or within relation (or child, i.e. positive, negative effect)
Look for any model where exists a non-root MF linked to a BP with a causally upstream of or within relation (or child, i.e. positive, negative effect)
@balhoff @ukemi @pgaudet @vanaukenk