ESIPFed / sweet

Official repository for Semantic Web for Earth and Environmental Terminology (SWEET) Ontologies
Other
120 stars 34 forks source link

US versus British spellings #250

Open brandonnodnarb opened 3 years ago

brandonnodnarb commented 3 years ago

Example: In Aluminum, Hematite and Sulfur are US spellings. I'm sure there are many others.

Shall we include British spellings? If so, is adding another rdfs:label tag sufficient or are there other preferred methods (e.g. skos:altLabel, something else?)

dr-shorthair commented 3 years ago

You could just have multiple rdfs:label with language tags:

<http://sweetontology.net/matrMineral/Hematite>
    rdfs:label "Hematite"@en-US ;
    rdfs:label "Haematite"@en-GB ;
.

Else if we agree that US-English is the default/canonical English:

<http://sweetontology.net/matrMineral/Hematite>
    rdfs:label "Hematite"@en ;
    rdfs:label "Haematite"@en-GB ;
.

Else if we agree that US-English is the default/canonical language:

<http://sweetontology.net/matrMineral/Hematite>
    rdfs:label "Hematite" ;
    rdfs:label "Haematite"@en-GB ;
.

Else we could just agree to limit the labelling to US-English.

This is a business or policy decision.

smrgeoinfo commented 3 years ago

I'd suggest having only one rdfs:label-- it makes SPARQL queries that pull a label easier (no language filter), but having a language label is good. Some convention would be a good thing to avoid mix of all three ( rdfs:label "Hematite"; rdfs:label "Sulfur"@en ; rdfs:label "Aluminum"@en-US ).

Currently I can find 12818 language labels in SWEET, and they're all @en, so seems that current practice is to use @en with convention that default English dialect is US.

How about using skos:altLabel for @en-gb, or any other dialect/language synonyms.

dr-shorthair commented 3 years ago

I think that would not strictly match the semantics of skos:altLabel as defined in the SKOS reference. Language variants are supposed to use language tags. altLabel is for other kinds of label variants. If you want to retain a single rdfs:label and avoid language-tag messiness there (which I understand) then I think this pattern would be a better fit to SKOS:

<http://sweetontology.net/matrMineral/Hematite>
    rdfs:label "Hematite"@en ;
    skos:prefLabel "Hematite"@en-US ;
    skos:prefLabel "Haematite"@en-GB ;
.

or

<http://sweetontology.net/matrMineral/Hematite>
    rdfs:label "Hematite" ;
    skos:prefLabel "Hematite"@en-US ;
    skos:prefLabel "Haematite"@en-GB ;
.
smrgeoinfo commented 3 years ago

Makes sense. I like this one:

<http://sweetontology.net/matrMineral/Hematite>
    rdfs:label "Hematite"@en ;
    skos:prefLabel "Hematite"@en-US ;
    skos:prefLabel "Haematite"@en-GB ;
.
nicholascar commented 3 years ago

I recommend against the rdfs:label + skos:prefLabel pattern and recommend skos:prefLabel + skos:altLabel.

The reason for this is that some vocab systems prefer SKOS-labels only or treat pairs of rdfs:label & skos:prefLabel as duplicates for the semantics of how to relate them, and therefore which is the primary, are unclear. Within just SKOS, as a SKOS design goal, this is perfectly clear: if you wish US or other spelling to be primary, make that the prefLabel and all others altLabels.

A skos:prefLabel & skos:altLabel split for languages is not against the SKOS semantics, see SKOS Ref, Example 19

graybeal commented 3 years ago

I agree with @nicholascar. Not just because BioPortal would look for skos:prefLabel first and rdfs:label only if no prefLabel is offered, and uses skos:altLabel is used for synonyms. I would not recommend rdfs:label for synonyms because it is likely to be misconstrued given the availability of skos:prefLabel.

Unfortunately BioPortal would struggle with language tags but that is a defect in BioPortal. It's going to get fixed I am sure. So absolutely I would use language tags to indicate GB spellings where they exist. When there is the same spelling, I'd use skos:prefLabel with "@en" as the language tag, and when the spelling is different I'd have two prefLabels: one in en-US, one in en-GB. This makes it trivial to further internationalize the vocabulary with other languages later, and still present the preferred label in each language (and still search for synonyms in each language, or across languages).

rrovetto commented 3 years ago

I recommend allowing a flexible approach whereby the user can chose either US or British spelling, both of which are presented to them as alternatives or synonymous (if actually synonymous). They can select their display label.

Likewise for other languages: a no-English speaker should see the synonym in their language and be able to select it as their display and preferred label.

nicholascar commented 3 years ago

@rrovetto that is the intended use of SKOS prefLabels with language tags and different version of a single language's language tags can be used, as per the en-US & un-GB examples above. System preferences should be used for particular displays, so a person may enter in an @en and an @it par of prefLabels and another user may set their system to preferential display Italian tags.