Open jonc125 opened 4 years ago
Helen was working on this. She said:
So I've rebased the PMR work against master and checked that what's there still works. It puts a git_remote_url field onto the entity and adds a form field - it'll auto import versions from the repo. That's not PMR-specific.
There's also a browse interface that talks to the PMR API to browse models. The URL for that is /import/pmr/ and the bottom level is the model git url - at the moment it's a matter of copy pasting into the import form but it could be made into a proper UI flow.
The branch is called pmr. I'll also forward the email conversation I had with Tommy about the API.
---------- Forwarded message --------- From: Tommy Yu tommy.yu@auckland.ac.nz Date: Tue, 3 Jul 2018 at 06:28 Subject: Re: Model names To: Helen Sherwood-Taylor helen@rrdlabs.co.uk
Hi Helen,
Yes it was nice to meet up in Oxford, and I am glad that good progress is happening.
I understand the concerns, but actually implementing this will be tricky due to how CellML Model Repository originally started as a citation focused repository where the main entry is a single citation, and there will be one or many models accessible from that point. Then came a version with its own versioning scheme but every citation was flattened in as separate entities, which isn't what supposed to have happened. Then I came along and then saw that there are new requirements where there are models that are not derived from existing citations, and that the representation group all models of the same citation together.
That was done, but given that the presentation model is coupled tightly with the underlying CMS and it really doesn't have the concept of multiple titles (one for the author-year, the other for the title of the actual citation), this was split between the exposure (the container) and their exposure file(s) (which reference the actual CellML file). Needless to say this causes some issues and when translated to a restricted hypermedia representational format, this makes it difficult, so the title you see is only of the file as the listing page lists the exposure files directly (skipping the container). Also this went in-line with the de-emphasis of listing of the specific citations, hence there wasn't a drive to keep the author-year information presented everywhere (as more models are added without this, that information becomes irrelevant).
The other side of the use case for a limited set of data is that there are developers who find listing pages with all the data available being too slow to download and thus too much latency between retrieval and presentation to end users.
I've been wanting to have a separate citation store to maintain that information, but there is just no demand and no time for me to work on that, so this is why the information presented by the repository is not exactly uniform in the way that you might like to have.
Cheers, Tommy.
On 02/07/18 21:28, Helen Sherwood-Taylor wrote:
Hi Tommy
It was good to meet you at the Harmony workshop the other week, and thank you for your help navigating the PMR API, I made some good progress towards building an interface to browse for models to import into Web Lab.
One thing that would be helpful is to make the model names available, e.g. "Beeler, Reuter, 1977" as used for subheadings on https://models.physiomeproject.org/electrophysiology https://models.physiomeproject.org/electrophysiology html view - these are not currently included in the json representation. They'd be good to have both there and also in https://models.physiomeproject.org/e/12b/francis_garcia_middleton_2013.cellml/view https://models.physiomeproject.org/e/12b/francis_garcia_middleton_2013.cellml/view json representation - having this available would be useful for listing models for selection and also setting up the model name in Web Lab.
Cheers Helen
-- Helen Sherwood-Taylor
-- Helen Sherwood-Taylor
Director, RRD Labs Ltd. helen@rrdlabs.co.uk
I suspect the best way forward would be to force Web Lab users to decide on a model name themselves, rather than trying to guess one from PMR?
So for our chat, I guess the some questions would be:
Where I would say 3 is the most important, but the least fun :D
Knowing who to bother is the main thing. There's also the technical aspect of being able to push changes back to PMR, not just clone & pull. I think we'd always want a clone of the repo locally (too slow to fetch from Auckland every time!) but you want that kept in sync both ways.
Didn't discuss pulling from PMR yet
PRM is migrating (1-5 years) to an approach where every model is in a workspace (like now), but a workspace can live anywhere on the web. The main idea is to not re-invent gitlab/hub but just let users use whatever repo UI they like.
The current procedure for changing a curated model is:
The curator pulls (or doesn't pull) in the changes
The curator creates a new exposure from the updated workspace
The curator "expires" the old exposure, so that the new one becomes the main one people find
The "send a pull request" step is emailing Andre, who then bothers Anand about it
There is no work to make this a feature of the current PMR, but it will become available in the new PMR (so 1-5 years). Everything they do is via an API (the web interface talks to the API, which talks to the PMR database), so at this point it'll become possible to have a "send changes back to PMR" button on the WL
So a final workflow for us could be:
PMR ~reads~ can read everything that's in a workspace and sticks it in its DB. This includes checking if things are RDF files, and extracting any RDF from CellML files.
To test this we can:
and
@nickerso does that make sense? And where's the button for doing sparkly queries?
@nickerso does PMR support RDF in formats other than RDF/XML?
This workspace has a model with oxmeta annotations, including <bqbiol:is rdf:resource="https://chaste.comlab.ox.ac.uk/cellml/ns/oxford-metadata#membrane_voltage"/>
, so we should be able to sparql our way to that one
This workspace has external annotations, again including membrane_voltage
@MichaelClerx - that makes sense, but see https://aucklandphysiomerepository.readthedocs.io/en/latest/semantic-metadata.html#getting-your-workspace-indexed-by-the-repository for the extra steps required to get metadata indexed in the PMR triple store.
PMR supports various RDF serialisation formats, not just RDF/XML. Not sure if it is all the formats RDFlib supports or something else (I know @metatoaster has told me many times what they are!). TTL and ntriples are almost certainly supported.
The SPARQL endpoint is available here: https://models.physiomeproject.org/pmr2_virtuoso_search
Should note that the SPARQL search will only return results you have permission to view - so you need to be logged in to get results from private workspaces. And there is some filtering of SPARQL queries to restrict users from being able to edit the triplestore...
Thanks!
Anyone here have experience writing SPARQL queries? I'm struggling
prefix bqbiol: http://biomodels.net/biology-qualifiers/
prefix rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
SELECT *
WHERE
{
}
?
@nickerso what happens if an index file changes? or gets renamed? And is there some automated way to tell PMR which files to read? (E.g. can we indicate it e.g. in a manifest file?)
SELECT ?entity WHERE {
?entity <http://biomodels.net/biology-qualifiers/is> <https://special-ontology.com/membrane_voltage>
}
OK,
SELECT ?entity WHERE {
?entity <http://biomodels.net/biology-qualifiers/is> <https://chaste.comlab.ox.ac.uk/cellml/ns/oxford-metadata#membrane_voltage>
}
returns "entity" :D
Once you add a file to be indexed, the triple store will be updated when new versions of that file are pushed to the workspace. I don't think it will cope with renames other than removing the deleted file from the index. Exposures will also have a separate graph in the triple store fixed at that version.
I suspect there is a way to use the (authenticated) webservices to add a file to be indexed. I don't think there is a way to do this automatically with an OMEX manifest...although maybe soon.
OK have used this page http://models.cellml.org/workspace/595/rdf_indexer?_authenticator=23fc1f3290b0f946fb70a5b3d0def2fae70ff370 to move the CellML into the box on the right. Am assuming that's where the indexed files live? Then hit "apply changes and export to RDF store", but still see only 1 "entity" when I execute the query
(too slow to fetch from Auckland every time!)
Should note that PMR is currently hosted at AWS (specifically region us-west-2
), not Auckland. Has not been hosted in Auckland for some number of years now due to that specific concern.
Anyone here have experience writing SPARQL queries? I'm struggling
prefix bqbiol: <http://biomodels.net/biology-qualifiers/>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?s ?q
WHERE {
?s bqbiol:is ?q .
}
For public data (private data may be retrieved using OAuth, that's a separate discussion), the query may be done via something like, demonstrated with curl
:
$ curl -H "Accept: application/sparql-results+json" \
-d 'prefix bqbiol: <http://biomodels.net/biology-qualifiers/>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?s ?q
WHERE {
?s bqbiol:is ?q .
}' https://models.physiomeproject.org/pmr2_virtuoso_search
The output will be of the specific mimetype.
what happens if an index file changes? or gets renamed? And is there some automated way to tell PMR which files to read? (E.g. can we indicate it e.g. in a manifest file?)
A file in a workspace must be explicitly selected for RDF Indexing via the appropriate tab for the workspace (third one from the right), and once that is done the RDF will be extracted and loaded into the underlying store. Every time a push happens the indexing process should be triggered.
prefix bqbiol: <http://biomodels.net/biology-qualifiers/>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?s
WHERE {
?s bqbiol:is <https://chaste.comlab.ox.ac.uk/cellml/ns/oxford-metadata#membrane_voltage> .
}
is returning one result in grandi_2011_atrial_with_meta.cellml#V
Apologies, my example was missing that very important .
on the end...
Does it, @nickerso ?
For me that just returns "s" in the web interface, or
{"head": {"link": [], "vars": ["s"]}, "results": {"distinct": false, "bindings": [], "ordered": true}}
with curl ?
Again, note that the default curl
only returns public data as the request is unauthenticated.
But it's my model, it should be returning?
On a side note, is there some issue list / to-do list / wish list where we can request PMR find meta data files automatically? E.g. if there's a manifest file or some OMEX thing?
On a side note, is there some issue list / to-do list / wish list where we can request PMR find meta data files automatically? E.g. if there's a manifest file or some OMEX thing?
For one thing, this would mean we now have to
So step 3 is no longer replacable by an automated PR
Another question: If we use external annotations, can we still use rdf:about
to link to a cmeta:id ? Or only unnamespaced ids?
Another question: If we use external annotations, can we still use
rdf:about
to link to a cmeta:id ? Or only unnamespaced ids?
It would still link to a cmeta:id - that's what the attribute is for, after all.
But it's my model, it should be returning?
OK, it works now that I'm logged in again! I was logged into models.cellml but not https://models.physiomeproject.org
Ah, now it finds both the file with internal and the file with external annotations.
@nickerso was just hinting at a feature where you use a manifest to select which files will be used in an exposure. That'd work for us, ideally if we could use the same file to tell WL that the given annotations go with a certain CellML file?
Quite frankly, the current way of indexing metadata from a workspace was not what I originally intended to release because it is a very haphazard way of "referencing" data. The original plan was to couple this with an exposure such that the returned resources would be immutable as per the exposure.
In any case, it is definitely possible to make use of the webservice version of the rdf_indexer
endpoint with an authenticated OAuth 1.0 request to submit the desired annotation file for indexing.
We do have a barebone, undocumented client built on top of the OAuth request. You can try running it by cloning it like so (note that it points to the staging instance by default - you will need to edit the URI to the main instance should you wish to - the same keys work):
$ git clone https://github.com/PMR2/pmr2.client
$ cd pmr2.client
$ pip install -e .
Obtaining file:///.../pmr2.client
$ # assuming you have updated the URI as stated to the main PMR
$ python src/pmr2/client/script.py
Please enter the verifier: xxxxxxxxxxxxxxxxx
Starting PMR2 Demo Shell...
pmr2cli> console
>>>
Now you can toy with the requests session object like so:
>>> r = self.client.session.get('https://models.physiomeproject.org/workspace/example/rdf_indexer')
>>> r.json()
To be helpful I prettified that output:
{
"collection": {
"href": "https://models.physiomeproject.org/w/tommy/model1/rdf_indexer",
"version": "1.0",
"template": {
"data": [
{
"prompt": "RDF Paths",
"name": "form.widgets.paths",
"required": false,
"value": [
],
"type": "List",
"options": null,
"description": "Paths that will be indexed as RDF."
},
{
"prompt": "Apply",
"name": "form.buttons.apply",
"required": false,
"value": null,
"type": "Button",
"description": null
},
{
"prompt": "Apply Changes and Export To RDF Store",
"name": "form.buttons.export_rdf",
"required": false,
"value": null,
"type": "Button",
"description": null
}
]
}
}
}
Note that the posting of data follows the Collection+JSON specification. Examples on how to submit data can be found at the PMR webservice documentation
So for that rdf_index
endpoint, you might do something like this:
>>> r = self.client.session.post(
... 'https://models.physiomeproject.org/workspace/example/rdf_indexer',
... json={"template": {"data": [
... {"name": "form.widgets.paths", "value": ["metadata.rdf"]},
... {"name": "form.buttons.export_rdf", "value": 1}]}})
>>> r.json()
I am only going to include the relevant segment:
{
"prompt": "RDF Paths",
"name": "form.widgets.paths",
"required": false,
"value": [
"metadata.rdf"
],
"type": "List",
"options": null,
"description": "Paths that will be indexed as RDF."
},
Note that metadata.rdf
is now provided as part of the value. The indexing should also be triggered because the pressing of the export_rdf
button is specified.
(as an aside, I did take the opportunity to make a couple corrections to this demo client script)
Another question: If we use external annotations, can we still use
rdf:about
to link to acmeta:id
? Or only unnamespaced ids?
Yes. In fact, the recommended manner is to reference by the relative path to the resource, followed by a #
to denote the cmeta:id
inside the target CellML file. So using my example with metadata.rdf
, it might contain a node rdf:about="my_model.cellml#time"
to reference an element with cmeta:id="time"
inside my_model.cellml
that is a sibling to metadata.rdf
.
Anyway I might be signing off soon, it's fast approaching 3am in Auckland.
Thank you!
Notes for work on https://github.com/ModellingWebLab/project_issues/issues/8