Open kallimathios opened 3 months ago
Thanks @kallimathios for the detailed ticket! When investigating this issue late last week, at least a partial cause of the different numbers when loading the same graph, comes down to the presence of ordered rdf:List
used for ordering triples in the Resource Templates. A short synopsis of how rdf:Lists
are implemented as a series of blank nodes that together with the rdf:first
and rdf:rest
predicates, generate an ordered list.
These rdf:List
intermediary blank-nodes are not being skolemized correctly with deterministic URLs but each time the same RDF resource is loaded, these blank-nodes identifiers are being randomly generated by the python rdflib library and show up as new triples. You can replicate this happening by just loading a single URL of a resource template in the Graph Explorer (this example is using PCC Template https://api.development.sinopia.io/resource/pcc:bf2:Serial:Work).
Doing an initial load in graph explorer results in the following statistics:
We then can run a couple of queries to see how many triples contain rdf:first
and rdf:rest
Now, if we click the Build button again for the same resource we see the number of triples increased to 477 from 422:
Re-run the SPARQL queries to see how many triples contain rdf:first
and rdf:rest
:
Taking a closer look at the rdf:first
list of subjects and objects, you can see the actual blank-nodes (i.e. https://api.development.sinopia.io/resource/pcc:bf2:Serial:Work#b43) have duplicate subjects for the same rdf:first
predicates.
I think a short-term fix is to just create a new graph every time the Build is clicked instead of trying to load the resource into the same graph. However, we will still need to address this problem as part of ticket 2.
Got it - this is super helpful. I will rebuild the graph each time. Also a needed reminder about the functionality to load and investigate a single resource. Thanks so much, @jermnelson !
I receive different numbers for the total number of triples when utilizing the graph summary feature of the graph explorer. I receive different results when I rebuild the graph and duplicate my actions without any changes to the group or environment, and I also receive different results when I restart the environment with a hard refresh. Additionally, results seem to vary when I navigate between groups within an environment. The below examples cover these scenarios.
The following example comes from running the summary in the Development environment for the "All" group. I received two different results without navigating to another group or restarting the environment:
I then tried restarting the environment, and received another different set of results:
While I did not get a screenshot, at one point the system returned 643,169 triples for the "All" group in Development.
I restarted the environment with a hard refresh and generated a summary for the Stage environment and All groups, with the following triples returned:
I then navigated to the next group, California State University, and built the graph within the Stage environment, then went back to the All group, I received these numbers:
I restarted the environment with a hard refresh and tried to generate a graph summary in the Production environment for the "All" group. I received the following two results without navigating to any other groups or restarting the environment.