ebi-chebi / ChEBI

Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on ‘small’ chemical compounds.
https://www.ebi.ac.uk/chebi
Creative Commons Attribution 4.0 International
42 stars 10 forks source link

Improve Header and Metadata of OWL dumps #4426

Open jmkeil opened 1 year ago

jmkeil commented 1 year ago

The header and metadata of the OWL dumps could be improved.

Here a current header example:

<rdf:RDF xmlns="http://purl.obolibrary.org/obo/chebi.owl#"
     xml:base="http://purl.obolibrary.org/obo/chebi.owl"
     xmlns:chebi1="http://purl.obolibrary.org/obo/chebi#3"
     xmlns:chebi2="http://purl.obolibrary.org/obo/chebi#"
     xmlns:chebi3="http://purl.obolibrary.org/obo/chebi#1"
     xmlns:chebi="http://purl.obolibrary.org/obo/chebi#2"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:owl="http://www.w3.org/2002/07/owl#"
     xmlns:oboInOwl="http://www.geneontology.org/formats/oboInOwl#"
     xmlns:xml="http://www.w3.org/XML/1998/namespace"
     xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
     xmlns:obo="http://purl.obolibrary.org/obo/">
    <owl:Ontology rdf:about="http://purl.obolibrary.org/obo/chebi.owl">
        <owl:versionIRI rdf:resource="http://purl.obolibrary.org/obo/chebi/225/chebi.owl"/>
        <rdfs:comment rdf:datatype="http://www.w3.org/2001/XMLSchema#string">ChEBI subsumes and replaces the Chemical Ontology first</rdfs:comment>
        <rdfs:comment rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Author: ChEBI curation team</rdfs:comment>
        <rdfs:comment rdf:datatype="http://www.w3.org/2001/XMLSchema#string">developed by Michael Ashburner &amp; Pankaj Jaiswal.</rdfs:comment>
        <rdfs:comment rdf:datatype="http://www.w3.org/2001/XMLSchema#string">ChEBI Release version 225</rdfs:comment>
        <oboInOwl:saved-by rdf:datatype="http://www.w3.org/2001/XMLSchema#string">chebi</oboInOwl:saved-by>
        <oboInOwl:date rdf:datatype="http://www.w3.org/2001/XMLSchema#string">27:08:2023 19:12</oboInOwl:date>
        <oboInOwl:hasOBOFormatVersion rdf:datatype="http://www.w3.org/2001/XMLSchema#string">1.2</oboInOwl:hasOBOFormatVersion>
        <oboInOwl:default-namespace rdf:datatype="http://www.w3.org/2001/XMLSchema#string">chebi_ontology</oboInOwl:default-namespace>
        <rdfs:comment rdf:datatype="http://www.w3.org/2001/XMLSchema#string">For any queries contact chebi-help@ebi.ac.uk</rdfs:comment>
    </owl:Ontology>

Some thoughts what could be improved:

cmungall commented 1 year ago

Thanks @jmkeil!

Additionally:

I would offer different advice from @jmkeil though, CHEBI should be consistent with OBO Metadata standards and OMO (https://obofoundry.org/ontology/omo)

@jmkeil you may want to petition OMO to include schema:discussionUrl or schema:email, see https://github.com/information-artifact-ontology/ontology-metadata/issues

CarMoreno commented 1 year ago

Hi all, many points of the above are already resolved in our current development for ChEBI 2.0 here.

  1. We let only one chebi prefix along all the ontology: prefix http://purl.obolibrary.org/obo/chebi/ (not #)
  2. The extra namespaces chebi1, chebi2, chebi3, chebi4 have been quite difficult to delete, we are using robot to generate the ontology right now (which is a process completely different than before, in the past was used an OBO file and robot just converted it to OWL) and, we suspect that because the resource starts with a number (i.e. 1_STAR, 2_STAR, 3_STAR) robot does not understand and the generated owl file has those extra namespaces. They are not used in any place in the ontology.
  3. RDF comments were unified.
  4. As mentioned @cmungall, versionIRI keeps it as it is.
  5. dcterms:license was added and dcterms:creator as well. But for sure, we are going to check if it is possible to use dcterms:contributor.
  6. We are using a number to tag a version, but I realise that it is even better to use the release' current date in the IRI, we'll re-check this as well.
  7. Other things like xmls types and foaf:homepage was added

@cmungall we are planning to have LITE and CORE variants as well, as you can see on the FTP link, so yes, we would need to generate other ontology iris, I guess we would need to include in the PURL repository??

Last but not least, @jmkeil and @cmungall you guys can start to play with the new ontology, just taking into account it is in the development phase and it is not official.

cmungall commented 1 year ago

Hi all, many points of the above are already resolved in our current development for ChEBI 2.0 here.

Awesome!

we suspect that because the resource starts with a number (i.e. 1_STAR, 2_STAR, 3_STAR)

Why not make new PURLs for your subsets?

We are using a number to tag a version, but I realise that it is even better to use the release' current date in the IRI, we'll re-check this as well.

It's more conventional in OBO to use ISO-8601-based versionIRIs (please don't invent a different way!) but bear in mind that you'll need to support the old versionIRIs and you can't retroactively give date based versionIRIs for these.

https://obofoundry.org/principles/fp-004-versioning.html

we are planning to have LITE and CORE variants as well, as you can see on the FTP link, so yes, we would need to generate other ontology iris, I guess we would need to include in the PURL repository??

You can list different products in your OBO metadata entry. Each product has its own PURL. See for example CL https://obofoundry.org/ontology/cl which has a common pattern of providing the full ontology plus basic

jmkeil commented 1 year ago

Sorry for the delay.

I would offer different advice from @jmkeil though, CHEBI should be consistent with OBO Metadata standards and OMO (https://obofoundry.org/ontology/omo)

Agree. It should adhere to the community's standards.

many points of the above are already resolved in our current development for ChEBI 2.0 here.

Great.

Are there plans to switch to HTTP instead of FTP for the ontology download? Using HTTP transport encoding for transparent compression (i.e. requesting chebi.owl, but only transferring chebi.owl.gz without further user action) would be a real relief.

JervenBolleman commented 8 months ago

I just want to note that changing some of the header may mean changing some of the IRI's of concepts used in ChEBI owl today. This would be a breaking change for a number of tools, and as such I encourage the ChEBI team to announce these changes well ahead of time. e.g. a SPARQL query or a parser expecting chebi#is_conjugate_base_of would need to be rewritten to chebi/is_conjugate_base_of.

By the way to be clear I want these changes, just with some advance notice ;)

For rhea/swisslipids/uniprot we can do on the fly query rewriting to change chebi# into chebi/ but other users might not have that capability.