NCEAS / metadig-checks

MetaDIG suites and checks for data and metadata improvement and guidance.
Apache License 2.0
9 stars 9 forks source link

NaturalLanguageKeywords, ControlledKeywords check #27

Closed gothub closed 5 years ago

gothub commented 5 years ago

For these checks, I'm assuming that the way to determine if a keyword is 'natural' is the absence of a specified thesaurus. If a thesaurus is specified, then the keyword is 'controlled'.

Are there any other attribute of a keyword that needs to be evaluated to make this determination?

Here are examples: ISO

<gmd:keyword>
    <gco:CharacterString> Oceans &gt; Salinity/Density &gt; Salinity</gco:CharacterString>
</gmd:keyword>
...
<gmd:thesaurusName>
    <gmd:CI_Citation>
        <gmd:title>
            <gco:CharacterString>GCMD Earth Science Keywords. Version 5.3.3</gco:CharacterString>
        </gmd:title>
        <gmd:date gco:nilReason="unknown"/>
    </gmd:CI_Citation>
</gmd:thesaurusName>

and EML

    <keywordSet>
      <keyword keywordType="theme">Marine</keyword>
      <keyword keywordType="theme">Nitrate</keyword>
      <keyword keywordType="theme">Nutrients</keyword>
      <keyword keywordType="theme">Radiation</keyword>
      <keyword keywordType="theme">Temperature</keyword>
      <keywordThesaurus>Knowledge Network for Biocomplexity</keywordThesaurus>
    </keywordSet>
gothub commented 5 years ago

Superceded by https://github.com/NCEAS/metadig-checks/issues/63 and https://github.com/NCEAS/metadig-checks/issues/48