ModellingWebLab / WebLab

Django-based front-end for the modelling Web Lab v2
Other
3 stars 2 forks source link

Linking to the PMR #254

Open jonc125 opened 4 years ago

jonc125 commented 4 years ago

Notes for work on https://github.com/ModellingWebLab/project_issues/issues/8

jonc125 commented 4 years ago

Helen was working on this. She said:

So I've rebased the PMR work against master and checked that what's there still works. It puts a git_remote_url field onto the entity and adds a form field - it'll auto import versions from the repo. That's not PMR-specific.

There's also a browse interface that talks to the PMR API to browse models. The URL for that is /import/pmr/ and the bottom level is the model git url - at the moment it's a matter of copy pasting into the import form but it could be made into a proper UI flow.

The branch is called pmr. I'll also forward the email conversation I had with Tommy about the API.

---------- Forwarded message --------- From: Tommy Yu tommy.yu@auckland.ac.nz Date: Tue, 3 Jul 2018 at 06:28 Subject: Re: Model names To: Helen Sherwood-Taylor helen@rrdlabs.co.uk

Hi Helen,

Yes it was nice to meet up in Oxford, and I am glad that good progress is happening.

I understand the concerns, but actually implementing this will be tricky due to how CellML Model Repository originally started as a citation focused repository where the main entry is a single citation, and there will be one or many models accessible from that point. Then came a version with its own versioning scheme but every citation was flattened in as separate entities, which isn't what supposed to have happened. Then I came along and then saw that there are new requirements where there are models that are not derived from existing citations, and that the representation group all models of the same citation together.

That was done, but given that the presentation model is coupled tightly with the underlying CMS and it really doesn't have the concept of multiple titles (one for the author-year, the other for the title of the actual citation), this was split between the exposure (the container) and their exposure file(s) (which reference the actual CellML file). Needless to say this causes some issues and when translated to a restricted hypermedia representational format, this makes it difficult, so the title you see is only of the file as the listing page lists the exposure files directly (skipping the container). Also this went in-line with the de-emphasis of listing of the specific citations, hence there wasn't a drive to keep the author-year information presented everywhere (as more models are added without this, that information becomes irrelevant).

The other side of the use case for a limited set of data is that there are developers who find listing pages with all the data available being too slow to download and thus too much latency between retrieval and presentation to end users.

I've been wanting to have a separate citation store to maintain that information, but there is just no demand and no time for me to work on that, so this is why the information presented by the repository is not exactly uniform in the way that you might like to have.

Cheers, Tommy.

On 02/07/18 21:28, Helen Sherwood-Taylor wrote:

Hi Tommy

It was good to meet you at the Harmony workshop the other week, and thank you for your help navigating the PMR API, I made some good progress towards building an interface to browse for models to import into Web Lab.

One thing that would be helpful is to make the model names available, e.g. "Beeler, Reuter, 1977" as used for subheadings on https://models.physiomeproject.org/electrophysiology https://models.physiomeproject.org/electrophysiology html view - these are not currently included in the json representation. They'd be good to have both there and also in https://models.physiomeproject.org/e/12b/francis_garcia_middleton_2013.cellml/view https://models.physiomeproject.org/e/12b/francis_garcia_middleton_2013.cellml/view json representation - having this available would be useful for listing models for selection and also setting up the model name in Web Lab.

Cheers Helen

-- Helen Sherwood-Taylor

-- Helen Sherwood-Taylor

Director, RRD Labs Ltd. helen@rrdlabs.co.uk

jonc125 commented 4 years ago

I suspect the best way forward would be to force Web Lab users to decide on a model name themselves, rather than trying to guess one from PMR?

MichaelClerx commented 4 years ago

So for our chat, I guess the some questions would be:

Where I would say 3 is the most important, but the least fun :D

jonc125 commented 4 years ago

Knowing who to bother is the main thing. There's also the technical aspect of being able to push changes back to PMR, not just clone & pull. I think we'd always want a clone of the repo locally (too slow to fetch from Auckland every time!) but you want that kept in sync both ways.

MichaelClerx commented 4 years ago
MichaelClerx commented 4 years ago

So a final workflow for us could be:

Fixing an existing model

Creating a new model (variant)

But the bits we can implement at the moment seem restricted to:

  1. Get forking from PMR in place, in anticipation of eventually having a PR button
  2. Get storing RDF in external files in place, in anticipation of CellML 2
MichaelClerx commented 4 years ago

Annotations

PMR ~reads~ can read everything that's in a workspace and sticks it in its DB. This includes checking if things are RDF files, and extracting any RDF from CellML files.

To test this we can:

  1. Create a workspace on PMR, add one of our current models to it
  2. Tell PMR which files to index
  3. Do a SPARQL query on one of our ontology terms and see if we find the model

and

  1. Create a workspace on PMR, add a model with ids but no RDF
  2. Add a separate RDF file (in XML format I guess?)
  3. Tell PMR which files to index
  4. Do another SPARQL query

@nickerso does that make sense? And where's the button for doing sparkly queries?

jonc125 commented 4 years ago

@nickerso does PMR support RDF in formats other than RDF/XML?

MichaelClerx commented 4 years ago

This workspace has a model with oxmeta annotations, including <bqbiol:is rdf:resource="https://chaste.comlab.ox.ac.uk/cellml/ns/oxford-metadata#membrane_voltage"/>, so we should be able to sparql our way to that one

This workspace has external annotations, again including membrane_voltage

nickerso commented 4 years ago

@MichaelClerx - that makes sense, but see https://aucklandphysiomerepository.readthedocs.io/en/latest/semantic-metadata.html#getting-your-workspace-indexed-by-the-repository for the extra steps required to get metadata indexed in the PMR triple store.

PMR supports various RDF serialisation formats, not just RDF/XML. Not sure if it is all the formats RDFlib supports or something else (I know @metatoaster has told me many times what they are!). TTL and ntriples are almost certainly supported.

The SPARQL endpoint is available here: https://models.physiomeproject.org/pmr2_virtuoso_search

nickerso commented 4 years ago

Should note that the SPARQL search will only return results you have permission to view - so you need to be logged in to get results from private workspaces. And there is some filtering of SPARQL queries to restrict users from being able to edit the triplestore...

MichaelClerx commented 4 years ago

Thanks!

Anyone here have experience writing SPARQL queries? I'm struggling

prefix bqbiol: http://biomodels.net/biology-qualifiers/
prefix rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#

SELECT *
WHERE
{

}

?

MichaelClerx commented 4 years ago

@nickerso what happens if an index file changes? or gets renamed? And is there some automated way to tell PMR which files to read? (E.g. can we indicate it e.g. in a manifest file?)

nickerso commented 4 years ago
SELECT ?entity WHERE {
    ?entity <http://biomodels.net/biology-qualifiers/is> <https://special-ontology.com/membrane_voltage>
}
MichaelClerx commented 4 years ago

OK,

SELECT ?entity WHERE {
?entity <http://biomodels.net/biology-qualifiers/is> <https://chaste.comlab.ox.ac.uk/cellml/ns/oxford-metadata#membrane_voltage>
}

returns "entity" :D

nickerso commented 4 years ago

Once you add a file to be indexed, the triple store will be updated when new versions of that file are pushed to the workspace. I don't think it will cope with renames other than removing the deleted file from the index. Exposures will also have a separate graph in the triple store fixed at that version.

I suspect there is a way to use the (authenticated) webservices to add a file to be indexed. I don't think there is a way to do this automatically with an OMEX manifest...although maybe soon.

MichaelClerx commented 4 years ago

OK have used this page http://models.cellml.org/workspace/595/rdf_indexer?_authenticator=23fc1f3290b0f946fb70a5b3d0def2fae70ff370 to move the CellML into the box on the right. Am assuming that's where the indexed files live? Then hit "apply changes and export to RDF store", but still see only 1 "entity" when I execute the query

metatoaster commented 4 years ago

(too slow to fetch from Auckland every time!)

Should note that PMR is currently hosted at AWS (specifically region us-west-2), not Auckland. Has not been hosted in Auckland for some number of years now due to that specific concern.

Anyone here have experience writing SPARQL queries? I'm struggling

prefix bqbiol: <http://biomodels.net/biology-qualifiers/>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT ?s ?q
WHERE {
    ?s bqbiol:is ?q . 
}

For public data (private data may be retrieved using OAuth, that's a separate discussion), the query may be done via something like, demonstrated with curl:

$ curl -H "Accept: application/sparql-results+json" \
    -d 'prefix bqbiol: <http://biomodels.net/biology-qualifiers/>
        prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

        SELECT ?s ?q
        WHERE {
            ?s bqbiol:is ?q . 
        }' https://models.physiomeproject.org/pmr2_virtuoso_search

The output will be of the specific mimetype.

what happens if an index file changes? or gets renamed? And is there some automated way to tell PMR which files to read? (E.g. can we indicate it e.g. in a manifest file?)

A file in a workspace must be explicitly selected for RDF Indexing via the appropriate tab for the workspace (third one from the right), and once that is done the RDF will be extracted and loaded into the underlying store. Every time a push happens the indexing process should be triggered.

nickerso commented 4 years ago
prefix bqbiol: <http://biomodels.net/biology-qualifiers/>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT ?s
WHERE {
    ?s bqbiol:is <https://chaste.comlab.ox.ac.uk/cellml/ns/oxford-metadata#membrane_voltage> . 
}

is returning one result in grandi_2011_atrial_with_meta.cellml#V

Apologies, my example was missing that very important . on the end...

MichaelClerx commented 4 years ago

Does it, @nickerso ?

For me that just returns "s" in the web interface, or

{"head": {"link": [], "vars": ["s"]}, "results": {"distinct": false, "bindings": [], "ordered": true}}

with curl ?

metatoaster commented 4 years ago

Again, note that the default curl only returns public data as the request is unauthenticated.

MichaelClerx commented 4 years ago

But it's my model, it should be returning?

MichaelClerx commented 4 years ago

On a side note, is there some issue list / to-do list / wish list where we can request PMR find meta data files automatically? E.g. if there's a manifest file or some OMEX thing?

MichaelClerx commented 4 years ago

On a side note, is there some issue list / to-do list / wish list where we can request PMR find meta data files automatically? E.g. if there's a manifest file or some OMEX thing?

For one thing, this would mean we now have to

  1. Add an annotation file
  2. Update our workspace
  3. As the curator to pull the changes in AND select the new annotation file for indexing

So step 3 is no longer replacable by an automated PR

MichaelClerx commented 4 years ago

Another question: If we use external annotations, can we still use rdf:about to link to a cmeta:id ? Or only unnamespaced ids?

jonc125 commented 4 years ago

Another question: If we use external annotations, can we still use rdf:about to link to a cmeta:id ? Or only unnamespaced ids?

It would still link to a cmeta:id - that's what the attribute is for, after all.

MichaelClerx commented 4 years ago

But it's my model, it should be returning?

OK, it works now that I'm logged in again! I was logged into models.cellml but not https://models.physiomeproject.org

MichaelClerx commented 4 years ago

Ah, now it finds both the file with internal and the file with external annotations.

@nickerso was just hinting at a feature where you use a manifest to select which files will be used in an exposure. That'd work for us, ideally if we could use the same file to tell WL that the given annotations go with a certain CellML file?

metatoaster commented 4 years ago

Quite frankly, the current way of indexing metadata from a workspace was not what I originally intended to release because it is a very haphazard way of "referencing" data. The original plan was to couple this with an exposure such that the returned resources would be immutable as per the exposure.

In any case, it is definitely possible to make use of the webservice version of the rdf_indexer endpoint with an authenticated OAuth 1.0 request to submit the desired annotation file for indexing.

We do have a barebone, undocumented client built on top of the OAuth request. You can try running it by cloning it like so (note that it points to the staging instance by default - you will need to edit the URI to the main instance should you wish to - the same keys work):

$ git clone https://github.com/PMR2/pmr2.client
$ cd pmr2.client
$ pip install -e .
Obtaining file:///.../pmr2.client
$ # assuming you have updated the URI as stated to the main PMR
$ python src/pmr2/client/script.py
Please enter the verifier: xxxxxxxxxxxxxxxxx
Starting PMR2 Demo Shell...
pmr2cli> console
>>>

Now you can toy with the requests session object like so:

>>> r = self.client.session.get('https://models.physiomeproject.org/workspace/example/rdf_indexer')
>>> r.json()

To be helpful I prettified that output:

{
    "collection": {
        "href": "https://models.physiomeproject.org/w/tommy/model1/rdf_indexer",
        "version": "1.0",
        "template": {
            "data": [
                {
                    "prompt": "RDF Paths",
                    "name": "form.widgets.paths",
                    "required": false,
                    "value": [
                    ],
                    "type": "List",
                    "options": null,
                    "description": "Paths that will be indexed as RDF."
                },
                {
                    "prompt": "Apply",
                    "name": "form.buttons.apply",
                    "required": false,
                    "value": null,
                    "type": "Button",
                    "description": null
                },
                {
                    "prompt": "Apply Changes and Export To RDF Store",
                    "name": "form.buttons.export_rdf",
                    "required": false,
                    "value": null,
                    "type": "Button",
                    "description": null
                }
            ]
        }
    }
}

Note that the posting of data follows the Collection+JSON specification. Examples on how to submit data can be found at the PMR webservice documentation

So for that rdf_index endpoint, you might do something like this:

>>> r = self.client.session.post(
...     'https://models.physiomeproject.org/workspace/example/rdf_indexer',
...     json={"template": {"data": [
...         {"name": "form.widgets.paths", "value": ["metadata.rdf"]},
...         {"name": "form.buttons.export_rdf", "value": 1}]}})
>>> r.json()

I am only going to include the relevant segment:

                {
                    "prompt": "RDF Paths",
                    "name": "form.widgets.paths",
                    "required": false,
                    "value": [
                        "metadata.rdf"
                    ],
                    "type": "List",
                    "options": null,
                    "description": "Paths that will be indexed as RDF."
                },

Note that metadata.rdf is now provided as part of the value. The indexing should also be triggered because the pressing of the export_rdf button is specified.

(as an aside, I did take the opportunity to make a couple corrections to this demo client script)

Another question: If we use external annotations, can we still use rdf:about to link to a cmeta:id ? Or only unnamespaced ids?

Yes. In fact, the recommended manner is to reference by the relative path to the resource, followed by a # to denote the cmeta:id inside the target CellML file. So using my example with metadata.rdf, it might contain a node rdf:about="my_model.cellml#time" to reference an element with cmeta:id="time" inside my_model.cellml that is a sibling to metadata.rdf.

Anyway I might be signing off soon, it's fast approaching 3am in Auckland.

MichaelClerx commented 4 years ago

Thank you!