Closed metazool closed 3 years ago
https://github.com/BritishGeologicalSurvey/stratigraph/runs/1536862159?check_suite_focus=true - you can finally see the failing test on the SPARQL queries here (the queries are in stratigraph/store.py
)
Note that the query will need to be amended slightly so that the objects of the upper and lower relationships are generalised to their parent formation units if they are members. That is a separate issue from the failing test though, which I will contrinue to look into
I could understand if we were missing data because we're only querying for subjects which have both upper and lower boundary relations, and there will be missing cases where there is only one. It might make more sense to collect all the subjects in the given era, optionally filtered by Formation type, regardless of whether they have any upper/lower links, and then optionally filter out the detached ones when we construct the networkx graph.... The query in the data-loading script alongside the integration
test effectively does this (e.g. "give me all triples for everything in the Jurassic, no matter what it is"...
test_store.py has invalid SPARQL syntax - will push a fix for that shortly
I am happy to merge this if you are @rachelheaven and @kerberpolis
The dotfile output from the API based on this is returning a large collection of nodes, but no edges, but there's quite a lot in here already (and also I would l like to do a small overhaul collecting up all the data.bgs.ac.uk namespace references to keep them all in stratigraph/ns.py
I added a SPARQL query against a Fuseki store to return linked rock units corresponding to geological age, optionally filtered to only include those that have Formation rank
This includes an integration test which depends on having the current BGS Lexicon Linked Data, and our Jurassic sample, both loaded into a local Fuseki database named
stratigraph
. This test also runs in CI, collecting the Jurassic Lexicon data from data.bgs.ac.uk and adding the .ttl file of text mined relations in this project.HOWEVER the query is clearly off as the test shows it is returning fewer subjects when filtering harder, I worry I'm misunderstanding the data. Any feedback before we go any further - and especially improvements to the queries! would be appreciated