Closed tucotuco closed 2 years ago
It might also be worth adding an examples attribute.
The controlled vocabulary schema thesaurus.xsd imports a local copy of a file dc.xsd, which is meant to be a subset of Dublin Core terms needed in GBIF schemas. In dc.xsd is a declaration of a term dc:URI
, which is used in the thesaurus schema. But dc:URI
is a syntax encoding scheme (a datatype) in Dublin Core, not a property. Thus, it's declaration as a property in dc.xsd with a datatype of xs:anyURI
makes the term something entirely distinct and in conflict (incompatible) with the actual Dublin Core dc:URI
term.
I checked the other property declarations in dc.xsd. There do not seem to be any other incompatibilities.
And if we are going to fix anything, we might as well fix everything. The namespace declaration "xmlns:dcmitype="http://purl.org/dc/dcmitype/"
in thesaurus.xsd is not used. It can be omitted.
The conventional namespace abbreviation for http://purl.org/dc/terms/ is dcterms:
- dc:
is conventionally used for http://purl.org/dc/elements/1.1/. All of the terms in dc.xsd (except the aforementioned dc:URI
) are properties in the namespace http://purl.org/dc/terms/. I recommended changing all instances of dc:
to dcterms:
in thesaurus.xsd.
The property dcterms:subject
has the same purpose that dc:URI
is being used for in thesaurus.xsd, and dc:URI is not used in any other schema in https://github.com/gbif/rs.gbif.org/tree/master/schema. I recommend replacing all instances of dc:URI
in thesaurus.xsd with dcterms:subject
and remove <xs:attribute name="URI" type="xs:anyURI">
from dc.xsd. Its annotation can go on <xs:attribute name="subject" type="xs:string">
instead. This means that <xs:attribute ref="dc:URI" use="required"/>
needs to be replace with <xs:attribute ref="dc:subject" use="required"/>
in thesaurus.xsd and <xs:attribute ref="dc:subject" use="optional"/>
needs to be removed.
If desired, I can make any or all of the changes recommended in this issue. Building the controlled vocabularies referenced by the Darwin Core Core and Extension XML files depends on the resolution of this issue.
Thanks @tucotuco
Can you provide a link to an example showing the usage comments and examples, please? They seem surprisingly hard to find. I would have assumed there was only one value in the example, and the definition would describe the expected use.
Maybe I don't understand fully, but a thesaurus defines all allowed values explicitly. Why should there then be an example and what would it hold other than one of those values? Seems rather redundant.
The concept description
currently combines definition and usage notes as found in DwC terms I would think. Not sure if we need to separate between the two. If we want the definition to be more stable and allow usage notes to change freely this would be an option. But so far we had rather loose versioning of vocabularies compared to property terms and extension definitions. If we stick with that a single description sounds simpler and more appropriate to me.
in SKOS you have a definition and a scopeNote which could be seen similar to usage notes?
Thanks @tucotuco
Can you provide a link to an example showing the usage comments and examples, please? They seem surprisingly hard to find. I would have assumed there was only one value in the example, and the definition would describe the expected use.
The Darwin Core Classes recommended as vocabulary for basisOfRecord (e.g., PreservedSpecimen) all have examples, but do not have usage notes. All of the recommended controlled vocabulary terms (Concepts) for the Darwin Core terms establishmentMeans, degreeOfEstablishment, and pathway have usage notes, but do not have examples (except at times in the usage notes). Here is an example usage note for the Concept "native", "Considered native and naturally occuring [sic]. See also Blackburn et al. 2011 https://doi.org/10.1016/j.tree.2011.03.023 category A".
Maybe I don't understand fully, but a thesaurus defines all allowed values explicitly. Why should there then be an example and what would it hold other than one of those values? Seems rather redundant.
I hope this was answered in the preceding comment.
The concept
description
currently combines definition and usage notes as found in DwC terms I would think. Not sure if we need to separate between the two. If we want the definition to be more stable and allow usage notes to change freely this would be an option. But so far we had rather loose versioning of vocabularies compared to property terms and extension definitions. If we stick with that a single description sounds simpler and more appropriate to me.
As of https://github.com/gbif/rs.gbif.org/pull/71, the Comments and Examples have been included in extension.xsd. When those were all combined in the definition, the definition was sufficient. After the separation of the Comments and Examples into non-normative parts of the terms, they would be lost or have to be remerged into the definition in order not to lose that useful information. I wrote a script (see https://github.com/gbif/rs.gbif.org/issues/21#issuecomment-900490420) to generate the Darwin Core XML files following the new extension.xsd, so nothing is lost at all, and no manual labor has to be done to update those XML files. All content will be consistent between the standard and the XML files this way as well, unless overridden in the configuration files that are used by the script.
So, it is actually simpler now to have these three attributes separate than to have them combined. A win-win.
in SKOS you have a definition and a scopeNote which could be seen similar to usage notes?
Yes, the SKOS scopeNote is functionally the same as the <dcterms:description>
in the Darwin Core terms (the "Notes" in the Darwin Core Quick Reference Guide term displays).
The Darwin Core examples are SKOS examples.
I am otherwise ready with the new controlled vocabulary files for basisOfRecord, establishmentMeans, degreeOfEstablishment, and pathway, awaiting confirmation of whether any of the proposed changes to thesaurus.xsd are acceptable at this time.
Sounds sensible to me to also have the separation in the thesaurus.xsd then. But I don't know how much impact that would have on the IPT and other tools. My guess is rather little as its an extra field that can be picked up gradually, @marcos-lg ?
@mike-podolskiy90 is now the main developer for the IPT.
Changing dc:URI
to dc:subject
would break currently-deployed IPTs. Could we deprecate dc:URI
, but keep the attribute until IPTs ≤2.5.0 are no longer used? (Note this would take years.)
Current native definition:
<concept
dc:identifier="native"
dc:URI="http://rs.gbif.org/vocabulary/gbif/establishment_means/native"
dc:relation=""
dc:description="A species that is a part of the balance of nature that has developed over hundreds or thousands of years in a particular region or ecosystem. The word native should always be used with a geographic qualifier (for example, native to New England).">
<preferred>
<term dc:title="native" xml:lang="en"/>
</preferred>
<alternative>
<term dc:title="indigenous" xml:lang="en" />
<term dc:title="reintroduced" xml:lang="en" />
</alternative>
</concept>
What I think John is proposing, but with the dc:URI
retained:
<concept
dc:identifier="e001"
① dc:URI="http://rs.tdwg.org/dwcem/values/e001"
② dc:subject="http://rs.tdwg.org/dwcem/values/e001"
⑤ dc:relation="https://doi.org/10.3897/biss.3.38084"
③ dc:description="A taxon occurring within its natural range."
④ dc:comments="What is considered native to an area varies with the biogeographic history of an area and the local interpretation of what is a “natural range”."
⑥ dc:examples="">
<preferred>
<term dc:title="native" xml:lang="en"/>
</preferred>
<alternative>
<term dc:title="indigenous" xml:lang="en" />
</alternative>
</concept>
① Is probably required for backward compatibility, replaced by ②.
Usage notes removed from ③ and added to new ④ dc:comments
attribute.
⑤ dc:relation
is used by the IPT to link to further documentation.
⑥ Doesn't exist for this concept, but does for PreservedSpecimen "A plant on an herbarium sheet. A cataloged lot of fish in a jar.".
@MattBlissett That is even more than I was proposing, but it is a thing of beauty. The only "tough" part for automation is extraction of the citation for dc:relation from the dc:comments where it is currently.
The only "tough" part...
Might it be worth considering if it is worth the effort? Do they do anything more than power the IPT dropdown?
It seems to me that the biggest consideration will be how you want to eventually produce these vocabularies automatically from the registry. That may be a ways off, so what can/should we do now so that we can have the updated vocabs available to users in the IPT?
I'd suggest we add the easy things, and keep whatever is needed to keep compatibility with the 100s of installations. I understand from the comments above, that would mean
<concept
dc:identifier="e001"
① dc:URI="http://rs.tdwg.org/dwcem/values/e001"
② dc:subject="http://rs.tdwg.org/dwcem/values/e001"
⑤ dc:relation="https://doi.org/10.3897/biss.3.38084"
③ dc:description="A taxon occurring within its natural range."
④ dc:comments="What is considered native to an area varies with the biogeographic history of an area and the local interpretation of what is a “natural range”.">
<preferred>
<term dc:title="native" xml:lang="en"/>
</preferred>
<alternative>
<term dc:title="indigenous" xml:lang="en" />
</alternative>
</concept>
Notes
I'd suggest we add the easy things, and keep whatever is needed to keep compatibility with the 100s of installations. I understand from the comments above, that would mean
<concept dc:identifier="e001" ① dc:URI="http://rs.tdwg.org/dwcem/values/e001" ② dc:subject="http://rs.tdwg.org/dwcem/values/e001" ⑤ dc:relation="https://doi.org/10.3897/biss.3.38084" ③ dc:description="A taxon occurring within its natural range." ④ dc:comments="What is considered native to an area varies with the biogeographic history of an area and the local interpretation of what is a “natural range”."> <preferred> <term dc:title="native" xml:lang="en"/> </preferred> <alternative> <term dc:title="indigenous" xml:lang="en" /> </alternative> </concept>
That's super easy, it just requires the addition of
<xs:attribute ref="dc:comments" use="optional"/>
to thesaurus.xsd. I will proceed in finishing off the controlled vocabulary xml files for basisOfRecord, establishmentMeans, degreeOfEstablishment, and pathway in anticipation of this being added. They just need to have the comments added.
Notes
- only populating ⑤ if it were easy otherwise, leave it null
- I removed ⑥ as the text in the BasisOfRecord values ("A plant on an herbarium sheet. A cataloged lot of fish in a jar") are not examples of what people should be putting into this controlled field but are actually closer to a usage note. Examples should be cut and paste examples of what you might use in the field.
Since there seems to be a misunderstanding even here about what those examples are (examples of a PreservedSpecimen
, which IS the vocabulary term), not examples of what would go in basisOfRecord for a PreservedSpecimen, and since there are no examples in the other vocabularies. I am fine with omitting those.
Thanks, @tucotuco - I think we've arrived at the design for this issue
I'll comment on the side discussion on examples here though just to help explain the thinking.
The thesauri schema was originally modeled by @mdoering and me to provide an enumerated picklist (label and definition) only, and that probably explains why we both initially questioned the notion of examples. In this thread and in the GBIF vocabulary server, the thinking is much more aligned to SKOS which is certainly no bad thing.
As we evolve, I think we would be better to strictly keep example
as a means to illustrate how things should be done and use scopeNote
if we want text about where you might apply the concept, which I expect will be rarely needed. This is in line with my understanding of SKOS which says that example
is for an example of the use of a concept. I recognize that sentence is a bit ambiguous, but note that all the SKOS documentation does provide technical examples of use, not descriptive examples of where you would use it - e.g. see SKOS Core Vocabulary Specification where an example is a link such as this.
For PreservedSpecimen
, this would then mean:
Identifier | http://rs.tdwg.org/dwc/terms/PreservedSpecimen |
Definition | A specimen that has been preserved. |
Examples | link |
Scope notes | Suitable for use when a record represents a plant on a herbarium sheet, a cataloged lot of fish in a jar, a pinned insect etc. |
Note: I use a live record link here but we wouldn't do that in practice. Note 2: Scope note could be used to clarify what do to in less usual cases such as preserving a seed etc
Hope this helps explain the thinking at least, and I'm happy to be convinced otherwise.
Is it worthwhile to enable controlled vocabulary files in XML to include the usage notes for the term? The controlled vocabulary values for basisOfRecord, establishmentMeans, degreeOfEstablishment, and pathway all have such notes.
This issue is similar to https://github.com/gbif/rs.gbif.org/issues/45 for properties, which is in the process of being implemented.