EBISPOT / DUO

Ontology for consent codes and data use requirements
Other
64 stars 15 forks source link

Produce flat version of DUO #14

Closed mcourtot closed 4 years ago

mcourtot commented 5 years ago

Our automated release pipeline could query the EBI RDF platform using the SPARQL query below. We could then export the result as CSV, HTML or else - tbd with Adrian, Dylan and the group.

https://www.ebi.ac.uk/rdf/services/sparql?query=PREFIX+rdf%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%3E%0D%0APREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0D%0APREFIX+owl%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2002%2F07%2Fowl%23%3E%0D%0APREFIX+xsd%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23%3E%0D%0APREFIX+dc%3A+%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Felements%2F1.1%2F%3E%0D%0APREFIX+dcterms%3A+%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Fterms%2F%3E%0D%0APREFIX+dbpedia2%3A+%3Chttp%3A%2F%2Fdbpedia.org%2Fproperty%2F%3E%0D%0APREFIX+dbpedia%3A+%3Chttp%3A%2F%2Fdbpedia.org%2F%3E%0D%0APREFIX+foaf%3A+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2F%3E%0D%0APREFIX+skos%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2004%2F02%2Fskos%2Fcore%23%3E%0D%0A%0D%0APREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E+%0D%0ASELECT+%3Fid+%3FchildLabel+%3Fdescription+%0D%0AFROM+%3Chttp%3A%2F%2Frdf.ebi.ac.uk%2Fdataset%2Fduo%3E+%0D%0A+++WHERE+%7B%0D%0A+++%3Fchild+rdfs%3Alabel+%3FchildLabel+.%0D%0A+++%3Fchild+%3Chttp%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FIAO_0000115%3E+%3Fdescription+.+%0D%0A+++%3Fchild+%3Chttp%3A%2F%2Fwww.geneontology.org%2Fformats%2FoboInOwl%23id%3E+%3Fid%0D%0A%7D&render=HTML&limit=25&offset=0#loadstar-results-section

Relequestual commented 5 years ago

It's possible that a JSON Schema from an ontology could be very useful more boradly, considering SchemaBlocks and Search.

There's been some previous (but now abandoned) work on owl to JSON Schema, so it's possible.

The Human Cell Atlas has already written a JSON Schema extension to support the use of ontologies, but I don't know how this works for them in practice.

There are a few possibilities here, and one could be that having onotlogies in JSON Schema files isn't needed, if a formal vocabulary extension is written for JSON Schema, but then you have possible tooling issues.

A simple JSON Schema which has an enum of all the terms, while large, may still be quite useful. Annotation fields could be used to include the descriptions and other meta information.

Just some thoughts =]

mcourtot commented 5 years ago

Hi @Relequestual - thanks for the thoughts! I think the discussion about a JSON schema is broader scope than this ticket, which is really only about producing a table/CSV file for the DUO terms (as per the output of the SPARQL query). But nonetheless a very interesting one, so I'll reply here and we can always move it if/as needed. I'll tag people I know are involved in those and related efforts to be sure we're covering a wide scope and others have a chance to be aware/pitch in.

@cmungall was involved on the OWL-JSON front so may be able to comment on this (or a new ticket/email if better suited?)

As far as I know the HCA ontology extension works well (pinging @lauraclarke) and is being used in production, and definitively something we may be interested in eg to constrain restrictions to specific subclasses of disease. When I looked into this @simonjupp said

Here’s an example from the HCA of JSON schema for an disease term https://github.com/HumanCellAtlas/metadata-schema/blob/master/json_schema/module/ontology/disease_ontology.json We also have a python based validator https://github.com/HumanCellAtlas/ingest-validator/tree/master/ontologyvalidator USI have a node.js based validator that is better documented https://github.com/EMBL-EBI-SUBS/json-schema-validator/blob/master/README.md

@mbrush also had ideas how those restrictions/patterns could be better formalised in the OWL file and he offered to write proposals/tickets for those.

Either way, we should be able if we know the top-level class to fetch all (in)direct subclasses (eg via OLS API) if it was a requirement from a driver project to have a list/enum of all possible values.

cmungall commented 5 years ago

We're producing json for a number of ontologies now. Robot and odk support it. May be a good idea for duo to include this in its release pipeline

On Fri, Jan 18, 2019, 03:48 Melanie Courtot <notifications@github.com wrote:

Hi @Relequestual https://github.com/Relequestual - thanks for the thoughts! I think the discussion about a JSON schema is broader scope than this ticket, which is really only about producing a table/CSV file for the DUO terms (as per the output of the SPARQL query). But nonetheless a very interesting one, so I'll reply here and we can always move it if/as needed. I'll tag people I know are involved in those and related efforts to be sure we're covering a wide scope and others have a chance to be aware/pitch in.

@cmungall https://github.com/cmungall was involved on the OWL-JSON front https://douroucouli.wordpress.com/2016/10/04/a-developer-friendly-json-exchange-format-for-ontologies/ so may be able to comment on this (or a new ticket/email if better suited?)

As far as I know the HCA ontology extension works well (pinging @lauraclarke https://github.com/lauraclarke) and is being used in production, and definitively something we may be interested in eg to constrain restrictions to specific subclasses of disease. When I looked into this @simonjupp https://github.com/simonjupp said

Here’s an example from the HCA of JSON schema for an disease term

https://github.com/HumanCellAtlas/metadata-schema/blob/master/json_schema/module/ontology/disease_ontology.json We also have a python based validator https://github.com/HumanCellAtlas/ingest-validator/tree/master/ontologyvalidator USI have a node.js based validator that is better documented https://github.com/EMBL-EBI-SUBS/json-schema-validator/blob/master/README.md

@mbrush https://github.com/mbrush also had ideas how those restrictions/patterns could be better formalised in the OWL file and he offered to write proposals/tickets for those.

Either way, we should be able if we know the top-level class to fetch all (in)direct subclasses (eg via OLS API https://www.ebi.ac.uk/ols/docs/api) if it was a requirement from a driver project to have a list/enum of all possible values.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/EBISPOT/DUO/issues/14#issuecomment-455520637, or mute the thread https://github.com/notifications/unsubscribe-auth/AADGOWVTv_OG0CYXxpSXaM_A7nn5zeoGks5vEbSTgaJpZM4Z7k9u .

simonjupp commented 5 years ago

The HCA validator is a different thing from building a JSON schema from an ontology but I'll add the links in here anyway for completeness. This validates terms against the OLS API using custom keyword extensions and the AJV validator.

https://github.com/elixir-europe/json-schema-validator https://www.npmjs.com/package/elixir-jsonschema-validator

mcourtot commented 5 years ago

See also https://github.com/EBISPOT/OLS/issues/235

mcourtot commented 5 years ago

CSV version at https://github.com/EBISPOT/DUO/blob/master/src/ontology/duo.csv

cmungall commented 5 years ago

See also https://github.com/ontodev/robot/issues/459

mcourtot commented 4 years ago

This has now been done and will propagate to the PURL system, https://github.com/OBOFoundry/purl.obolibrary.org/pull/605

mcourtot commented 4 years ago

https://raw.githubusercontent.com/EBISPOT/DUO/v2020-02-03/duo.csv