cioos-siooc / ckan

CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers datahub.io, catalog.data.gov and europeandataportal.eu/data/en/dataset among many other sites.
http://ckan.org/
Other
2 stars 4 forks source link

eov not populated when keywords writen as label in xml #165

Closed fostermh closed 1 year ago

fostermh commented 2 years ago

eg

<mri:descriptiveKeywords>
        <mri:MD_Keywords>
          <mri:keyword>
            <gco:CharacterString>Sea Surface Salinity</gco:CharacterString>
          </mri:keyword>
          <mri:keyword xsi:type="lan:PT_FreeText_PropertyType">
            <lan:PT_FreeText>
              <lan:textGroup>
                <lan:LocalisedCharacterString locale="fra">Salinité de la surface de la mer</lan:LocalisedCharacterString>
              </lan:textGroup>
            </lan:PT_FreeText>
          </mri:keyword>
          <mri:type>
            <mri:MD_KeywordTypeCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#MD_KeywordTypeCode" codeListValue="theme"/>
          </mri:type>
          <mri:thesaurusName>
            <cit:CI_Citation>
              <cit:title>
                <gco:CharacterString>Global Ocean Observing System Essential Ocean Variables</gco:CharacterString>
              </cit:title>
              <cit:onlineResource>
                <cit:CI_OnlineResource>
                  <cit:linkage>
                    <gco:CharacterString>[https://goosocean.org/index.php?option=com_oe&amp;task=viewDocumentRecord&amp;docID=17470](https://goosocean.org/index.php?option=com_oe&task=viewDocumentRecord&docID=17470)</gco:CharacterString>
                  </cit:linkage>
                </cit:CI_OnlineResource>
              </cit:onlineResource>
            </cit:CI_Citation>
          </mri:thesaurusName>
        </mri:MD_Keywords>
      </mri:descriptiveKeywords>
fostermh commented 1 year ago

this appears to be a result of setting clean_tags = true in a harvester config. This setting results in munge_tags being run on keywords and will replace all spaces in a keyword with underscores. As munge also removes all Unicode chars and replaces them with ASCII equivalents this is likely not something we want to do in cioos anyway. It would result in mangling french keywords.

fostermh commented 1 year ago

fixed by NOT using clean_tags = true in harvester config.

ItaloBorrelli commented 1 year ago

Thanks for clearing that up!