Open ManonGros opened 3 years ago
Could you give me access to this one, please? Thanks!
@timrobertson100: I assume that we primarily use this in the the publisher and dataset metadata. Are there other uses of this vocabulary that need to be taken into account?
@timrobertson100: I assume that we primarily use this in the publisher and dataset metadata. Are there other uses of this vocabulary that need to be taken into account?
I don't think so, no
Do you not intend for this to be used beyond GBIF's needs? Codes for vernacular name languages was already identified as a major use case. In addition, the Occurrence Core, Event Core, and Audubon Core have dc:language at the record level. Audubon Core also has dcterms:language, metadataLanguageLiteral, and metadataLanguage at the record level.
the Occurrence Core, Event Core, and Audubon Core have dc:language at the record level. Audubon Core also has dcterms:language, metadataLanguageLiteral, and metadataLanguage at the record level.
Good point. I'd assume it drives those as any existing dictionary file likely does.
(The immediate priority is on interpretation needs in GBIF/ALA pipelines)
Some assumptions to verify before starting:
@tucotuco, @timrobertson100, does any of this already raise alarm around use cases you are aware of?
It looks good except that I suspect ISO 639-2 is insufficient for all known purposes in our community, especially ethnobiology. If you want, I can try to get a confirmation of that from Jonathan Amith, linguist and progenitor of DEMCA (https://demca.mesolex.org/portal/).
On Tue, Apr 20, 2021 at 12:27 PM Andrea Hahn @.***> wrote:
Some assumptions to verify before starting:
- ISO 639-1 (https://api.gbif.org/v1/enumeration/language: ISO 639-1 and 639-2) provides sufficient granularity to provide the concepts
- we would not want English language language names as concepts, but rather neutral entities (ISO 2-letter codes)
- English language names serve as labels, not as concepts, just as Spanish etc equivalents (and native titles?)
- national/regional variants like "es-AR" ( http://www.lingoes.net/en/translator/langcode.htm) are mapped as hidden values and interpreted to the 2-letter code
- multi-value verbatim data ("en | ru") for e.g. dataset descriptions containing text in both languages: handlling unclear. For GBIF use cases (finding a description in Russian language) it might be best to allow explicit mixed-content Concept definitions in standardized syntax
@tucotuco https://github.com/tucotuco, @timrobertson100 https://github.com/timrobertson100, does any of this already raise alarm around use cases you are aware of?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gbif/vocabulary/issues/77#issuecomment-823366488, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ723EAOJCMJPWVPI64RLTJWMM7ANCNFSM4ZMEHTCA .
ISO 639-3 might be needed, but I haven't investigated myself.
https://en.wikipedia.org/wiki/ISO_639-3#Usage has some links to other language-related systems, several depending on ISO 639.
Thanks, both! For practical purposes, that sounds as though we will eventually need multiple levels of granularity in the concepts list, with explicit parent declarations, rather than a single-level flat list. @marcos-lg, are hierarchichal vocabularies something already covered, or would that add more complexity than we want to handle in the first phase, please?
(from TimR via Skype): "LifeStage is in production and is an example of a hierarchical vocabulary It’s intended for 1 (maybe 2) levels deep. I’d advise anything more complex needs thought."
Here is a file to edit: https://drive.google.com/file/d/1wQ21ShfKNRrJ8VJMbNzbgHzNCHEn7laf/view?usp=sharing
It contains:
NB: For this vocabulary, please add the concepts by using the language enumeration: https://api.gbif.org/v1/enumeration/language
Pease check instructions here: https://github.com/gbif/vocabulary/issues/70