BioSchemas / specifications

Issue tracker, technical wiki, and example markup
https://bioschemas.org
54 stars 52 forks source link

Provide a Bioschemas context #219

Closed ljgarcia closed 5 years ago

ljgarcia commented 5 years ago

A context is required for mark ups to include it and fully benefit from Bioschemas.

AlasdairGray commented 5 years ago

We first need to stabilise the approach

ljgarcia commented 5 years ago

Some comments regarding this issue at issue 218

A quick summary here:

  1. Currently we say that a Bioschemas protein corresponds to pro:PR_000000001
  2. This leads to inconsistencies as mentiones by @JervenBolleman, see full comment on deprecated thread

https://www.uniprot.org/uniprot/P00519 a schema:webPage schema:mainEntity http://purl.uniprot.org/uniprot/P00519 . http://purl.uniprot.org/uniprot/P00519 a bioschema:DataRecord ; schema:mainEntity http://purl.uniprot.org/isoform/P00519-1 . http://purl.uniprot.org/isoform/P00519-1 a obo:PR_000000001 .

Of course for PRO the issue would be worse in reasoning terms.

obo:PR_P00519 rdfs:subClassOf obo:PR_000000001 .

becomes

obo:PR_P00519 rdf:type obo:PR_000000001 .

This is because schema.org has an instance based world view while SIO and PRO are class based ontologies.

  1. @micheldumontier shares @JervenBolleman 's concerns, see comment on deprecated thread
ljgarcia commented 5 years ago

I think we could work toward a Bioschemas vocabulary context similar to the one provided by schema.org.

For instance, see the context for ScholarlyArticle which includes links to the ontology partially used to create this schema, the Bibliographic Ontology (bibo). bibo defines a startPage and schema.org reuses it via owl:equivalentProperty:

{
      "@id": "schema:pageStart",
      "@type": "rdf:Property",
      "dct:source": {
        "@id": "http://www.w3.org/wiki/WebSchemas/SchemaDotOrgSources#source_bibex"
      },
      "owl:equivalentProperty": {
        "@id": "http://purl.org/ontology/bibo/pageStart"
      },
      "rdfs:comment": "The page on which the work starts; for example \"135\" or \"xiii\".",
      "rdfs:label": "pageStart",
... 

Another example, Product, based on another ontology, GoodRelations. In this case, they use dct:source:

{
      "@id": "schema:gtin8",
      "@type": "rdf:Property",
      "dct:source": {
        "@id": "http://www.w3.org/wiki/WebSchemas/SchemaDotOrgSources#source_GoodRelationsTerms"
      },
...

Rather than having an alias to obo:PR_000000001

  Protein: "http://purl.obolibrary.org/obo/PR_000000001"

We could have something like this for Bioschemas protein profile

{
      "@id": "bs:Protein",
      "@type": "rdfs:Class",
      "rdfs:comment": "This Protein profile specification presents the BioChemEntity usage when describing a Protein.",
      "rdfs:label": "Protein",
      "rdfs:subClassOf": {
        "@id": "schema:BioChemEntity"
      },
      "skos:closeMatch": {
        "@id": "pr:000000001"
      },
      "schema:sameAs": {
        "@id": "https://bioschemas.org/specifications/Protein"
      }
}

And something like this for properties reusing terms from existing ontologies

{
      "@id": "bs:associatedWith",
      "skos:closeMatch": {
        "@id": "so:associated_with"
      },
      "schema:domainIncludes": [
        {
          "@id": "schema:BioChemEntity"
        }, {
          "@id": "bs:Gene"
        }
      ]
}

Where bs refers to Bioschemas, and so to "http://purl.obolibrary.org/obo/so#". I am using skos:closeMatch rather than owl:equivalentProperty or owl:equivalenClass as I find the first one more relaxed than the other two, preventing the problem mentioned by @JervenBolleman

obo:PR_P00519 rdfs:subClassOf obo:PR_000000001 . becomes obo:PR_P00519 rdf:type obo:PR_000000001 .

The downside here would be new URLs for Bioschemas. I understand there was a twitter/comment a while ago against it but I think the advantages are more than the disadvantages. We are still reusing terms from well-known ontologies and adjusting them to their use for marking up web pages

Any thoughts?

AlasdairGray commented 5 years ago

With all of our examples, we should have an explicit and fully detailed context. By this I do not mean that we should reference some file on a webserver that could change. I mean that we should explicitly detail the context at the start of every example. I think this would be good practice since it means that examples are self contained.

BTW, it seems that not all services that process JSON-LD will ingest externally defined context files. We should state as best practice that all markup should include a self-contain expression of its context, i.e. we should not use a link to a file on the bioschemas webspace.

AlasdairGray commented 5 years ago

With the current proposal there will be no need for a context other than schema.org.

Also, as best practice, we should be encouraging the complete specification of the context in all markup, i.e. we should not be referring to a file that needs to be loaded in.