EBISPOT / OLS

Ontology Lookup Service from SPOT at EBI
http://www.ebi.ac.uk/ols
Apache License 2.0
95 stars 40 forks source link

When multiple labels are provided, which one should we use? #570

Open jamesamcl opened 2 years ago

jamesamcl commented 2 years ago

For example:

Screenshot 2022-03-14 at 10 41 12

As RDF triples are unordered, there is nothing to indicate in the OWL RDF/XML file which label of these is the preferred. So OLS would be equally justified in using "术语编辑者" in place of "term editor". Although multilang will disambiguate this somewhat, even then we have both "definition editor" and "term editor" for the en language.

We cannot just use all of the labels because OLS needs to select one to use as a property name in the API. For example, the above property appears as such:

Screenshot 2022-03-14 at 10 44 05
jamesamcl commented 2 years ago

Should the ontologies be doing something to indicate which is the preferred label and which are alternate? @matentzn ?

matentzn commented 2 years ago

We have a glass clear rule in OBO that only one (non-language-tagged) label is allowed for OBO ontologies.. rdfs:label is the "preferred" label the other ones are synonyms. In this case here, I would recommend to sort the labels and pick the first in alphabetical sort order.

jamesamcl commented 2 years ago

This is from CLO which is in OBO. Does your rule also state that there should only be one label per language?

matentzn commented 2 years ago

The language case has not been addressed yet unfortunately.. But my assumption is it will be similar.

KonradHoeffner commented 1 year ago

Our ontologies also have a general rule of one rdfs:label per language maximum, with skos:altLabel for an unrestricted number of synonyms, so I would just pick the first one that the programming language gives me (e.g. iterator.next()) but sorting to get a deterministic order seems fine as well.

As for when there are language tags, we have good experiences with priority lists of language tags in our tools including the empty string. For example when a tool is configured with ["en", "fr", "", "la"], it will prefer an English label to a French one, prefer both to one with no language tag at all but choose that over one in Latin.