geonetwork / core-geonetwork

GeoNetwork is a catalog application to manage spatially referenced resources. It provides powerful metadata editing and search functions as well as an interactive web map viewer. It is currently used in numerous Spatial Data Infrastructure initiatives across the world.
http://geonetwork-opensource.org/
GNU General Public License v2.0
404 stars 481 forks source link

Issue while harvesting CSW GeoTerritoire #6332

Open fgravin opened 2 years ago

fgravin commented 2 years ago

Describe the bug A clear and concise description of what the bug is.

To Reproduce

  1. Create a new CSW harvester
  2. Set the url to https://geoterritoires.hautsdefrance.fr/services/csw/?SERVICE=CSW&REQUEST=GetCapabilities&VERSION=2.0.2
  3. Save
  4. Run the harvester

error in logs

2022-05-25 09:48:57,785 ERROR [geonetwork.index] - Indexing stylesheet contains errors: null 
  Marking the metadata as _indexingError=1 in index

Example of CSW output record https://geoterritoires.hautsdefrance.fr/services/csw/?SERVICE=CSW&REQUEST=GetRecords&VERSION=2.0.2&outputSchema=http://www.isotc211.org/2005/gmd

<gmd:MD_Metadata xmlns:gmd="http://www.isotc211.org/2005/gmd" xmlns:gco="http://www.isotc211.org/2005/gco" xmlns:gml="http://www.opengis.net/gml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.isotc211.org/2005/gmd http://schemas.opengis.net/csw/2.0.2/profiles/apiso/1.0.0/apiso.xsd">
  <gmd:fileIdentifier>
    <gco:CharacterString>emp_act.pt_pxx_factocc1564</gco:CharacterString>
  </gmd:fileIdentifier>
  <gmd:language>
    <gmd:LanguageCode codeList="http://www.loc.gov/standards/iso639-2/" codeListValue="fre"/>
  </gmd:language>
  <gmd:characterSet>
    <gmd:MD_CharacterSetCode codeListValue="utf8" codeList="MD_CharacterSetCode"/>
  </gmd:characterSet>
  <gmd:hierarchyLevel>
    <gmd:MD_ScopeCode codeListValue="dataset" codeList="http://www.isotc211.org/2005/resources/codeList.xml#MD_ScopeCode"/>
  </gmd:hierarchyLevel>
  <gmd:contact>
    <gmd:CI_ResponsibleParty>
      <gmd:individualName>
        <gco:CharacterString/>
      </gmd:individualName>
      <gmd:organisationName>
        <gco:CharacterString>GéoTerritoires ()</gco:CharacterString>
      </gmd:organisationName>
      <gmd:contactInfo>
        <gmd:CI_Contact>
          <gmd:address>
            <gmd:CI_Address>
              <gmd:deliveryPoint>
                <gco:CharacterString/>
              </gmd:deliveryPoint>
              <gmd:city>
                <gco:CharacterString/>
              </gmd:city>
              <gmd:postalCode>
                <gco:CharacterString/>
              </gmd:postalCode>
              <gmd:country>
                <gco:CharacterString/>
              </gmd:country>
              <gmd:electronicMailAddress>
                <gco:CharacterString/>
              </gmd:electronicMailAddress>
            </gmd:CI_Address>
          </gmd:address>
        </gmd:CI_Contact>
      </gmd:contactInfo>
      <gmd:role>
        <gmd:CI_RoleCode codeList="http://standards.iso.org/ittf/PubliclyAvailableStandards/ISO_19139_Schemas/resources/Codelist/gmxCodelists.xml#CI_RoleCode" codeListValue="custodian">custodian</gmd:CI_RoleCode>
      </gmd:role>
    </gmd:CI_ResponsibleParty>
  </gmd:contact>
  <gmd:hierarchyLevelName>
    <gco:CharacterString>Jeu de données</gco:CharacterString>
  </gmd:hierarchyLevelName>
  <gmd:dateStamp>
    <gco:Date>2022-05-25T10:29:33+02:00</gco:Date>
  </gmd:dateStamp>
  <gmd:metadataStandardName>
    <gco:CharacterString>ISO 19115:2003/19139</gco:CharacterString>
  </gmd:metadataStandardName>
  <gmd:metadataStandardVersion>
    <gco:CharacterString>1.0</gco:CharacterString>
  </gmd:metadataStandardVersion>
  <gmd:locale>
    <gmd:PT_Locale id="FR">
      <gmd:languageCode>
        <gmd:languageCode codeList="http://www.loc.gov/standards/iso639-2/" codeListValue="fre"/>
      </gmd:languageCode>
    </gmd:PT_Locale>
  </gmd:locale>
  <gmd:identificationInfo>
    <gmd:MD_DataIdentification>
      <gmd:citation>
        <gmd:CI_Citation>
          <gmd:title xsi:type="gmd:PT_FreeText_PropertyType">
            <gco:CharacterString>taux d'emploi des femmes de 15-64 ans (%)</gco:CharacterString>
            <gmd:PT_FreeText>
              <gmd:textGroup>
                <gmd:LocalisedCharacterString locale="#FR">taux d'emploi des femmes de 15-64 ans (%)</gmd:LocalisedCharacterString>
              </gmd:textGroup>
            </gmd:PT_FreeText>
          </gmd:title>
          <gmd:identifier>
            <gmd:MD_Identifier>
              <gmd:code>
                <gco:CharacterString>emp_act.pt_pxx_factocc1564</gco:CharacterString>
              </gmd:code>
            </gmd:MD_Identifier>
          </gmd:identifier>
        </gmd:CI_Citation>
      </gmd:citation>
      <gmd:abstract>
        <gco:CharacterString>Le type d'activité répartit la population entre les actifs et les inactifs. Parmi les actifs, on distingue ceux qui ont un emploi (y compris les personnes en apprentissage ou en stage rémunéré), aussi appelés actifs occupés, des chômeurs. Parmi les inactifs, on peut notamment distinguer les élèves, étudiants et stagiaires non rémunérés, les retraités ou préretraités, les femmes ou hommes au foyer.</gco:CharacterString>
      </gmd:abstract>
      <gmd:descriptiveKeywords>
        <gmd:MD_Keywords>
          <gmd:keyword>
            <gco:CharacterString>Unités statistiques</gco:CharacterString>
          </gmd:keyword>
          <gmd:keyword>
            <gco:CharacterString>données ouvertes</gco:CharacterString>
          </gmd:keyword>
          <gmd:type>
            <gmd:MD_KeywordTypeCode codeList="MD_KeywordTypeCode" codeListValue="theme"/>
          </gmd:type>
          <gmd:thesaurusName>
            <gmd:CI_Citation>
              <gmd:title>
                <gco:CharacterString>GEMET inspire themes - version 1.0</gco:CharacterString>
              </gmd:title>
              <gmd:date>
                <gmd:CI_Date>
                  <gmd:date>
                    <gco:Date>2022-05-25T10:29:33+02:00</gco:Date>
                  </gmd:date>
                  <gmd:dateType>
                    <gmd:CI_DateTypeCode codeList="http://standards.iso.org/ittf/PubliclyAvailableStandards/ISO_19139_Schemas/resources/Codelist/ML_gmxCodelists.xml#CI_DateTypeCode" codeListValue="publication">publication</gmd:CI_DateTypeCode>
                  </gmd:dateType>
                </gmd:CI_Date>
              </gmd:date>
            </gmd:CI_Citation>
          </gmd:thesaurusName>
        </gmd:MD_Keywords>
      </gmd:descriptiveKeywords>
      <gmd:pointOfContact>
        <gmd:CI_ResponsibleParty>
          <gmd:individualName>
            <gco:CharacterString/>
          </gmd:individualName>
          <gmd:organisationName>
            <gco:CharacterString>GéoTerritoires ()</gco:CharacterString>
          </gmd:organisationName>
          <gmd:contactInfo>
            <gmd:CI_Contact>
              <gmd:address>
                <gmd:CI_Address>
                  <gmd:deliveryPoint>
                    <gco:CharacterString/>
                  </gmd:deliveryPoint>
                  <gmd:city>
                    <gco:CharacterString/>
                  </gmd:city>
                  <gmd:postalCode>
                    <gco:CharacterString/>
                  </gmd:postalCode>
                  <gmd:country>
                    <gco:CharacterString/>
                  </gmd:country>
                  <gmd:electronicMailAddress>
                    <gco:CharacterString/>
                  </gmd:electronicMailAddress>
                </gmd:CI_Address>
              </gmd:address>
            </gmd:CI_Contact>
          </gmd:contactInfo>
          <gmd:role>
            <gmd:CI_RoleCode codeList="http://standards.iso.org/ittf/PubliclyAvailableStandards/ISO_19139_Schemas/resources/Codelist/gmxCodelists.xml#CI_RoleCode" codeListValue="custodian">custodian</gmd:CI_RoleCode>
          </gmd:role>
        </gmd:CI_ResponsibleParty>
      </gmd:pointOfContact>
      <gmd:resourceConstraints>
        <gmd:MD_LegalConstraints>
          <gmd:accessConstraints>
            <gmd:MD_RestrictionCode codeList="http://www.isotc211.org/2005/resources/codeList.xml#MD_RestrictionCode" codeListValue=""/>
          </gmd:accessConstraints>
        </gmd:MD_LegalConstraints>
      </gmd:resourceConstraints>
      <gmd:spatialRepresentationType>
        <gmd:MD_SpatialRepresentationTypeCode codeList="http://www.isotc211.org/2005/resources/codeList.xml#MD_SpatialRepresentationTypeCode" codeListValue="textTable"/>
      </gmd:spatialRepresentationType>
      <gmd:spatialResolution>
        <gmd:MD_Resolution>
          <gmd:equivalentScale>
            <gmd:MD_RepresentativeFraction>
              <gmd:denominator>
                <gco:Integer>5000</gco:Integer>
              </gmd:denominator>
            </gmd:MD_RepresentativeFraction>
          </gmd:equivalentScale>
        </gmd:MD_Resolution>
      </gmd:spatialResolution>
      <gmd:language>
        <gco:CharacterString/>
      </gmd:language>
      <gmd:topicCategory>
        <gmd:MD_TopicCategoryCode>society</gmd:MD_TopicCategoryCode>
      </gmd:topicCategory>
      <gmd:extent>
        <gmd:EX_Extent>
          <gmd:temporalElement>
            <gmd:EX_temporalExtent>
              <gmd:extent>
                <gml:TimePeriod gml:id="emp_act.pt_pxx_factocc1564.annee">
                  <gml:beginPosition>2006</gml:beginPosition>
                  <gml:endPosition>2018</gml:endPosition>
                </gml:TimePeriod>
              </gmd:extent>
            </gmd:EX_temporalExtent>
          </gmd:temporalElement>
        </gmd:EX_Extent>
      </gmd:extent>
      <gmd:graphicOverview>
        <gmd:MD_BrowseGraphic>
          <gmd:fileName>
            <gco:CharacterString>https://geoterritoires.hautsdefrance.fr/GC_make_map.php?width=800&amp;lang=fr&amp;indics=emp_act.pt_pxx_factocc1564&amp;format=png&amp;</gco:CharacterString>
          </gmd:fileName>
          <gmd:fileDescription>
            <gco:CharacterString>Aperçu</gco:CharacterString>
          </gmd:fileDescription>
          <gmd:fileType>
            <gco:CharacterString>png</gco:CharacterString>
          </gmd:fileType>
        </gmd:MD_BrowseGraphic>
      </gmd:graphicOverview>
    </gmd:MD_DataIdentification>
  </gmd:identificationInfo>
  <gmd:distributionInfo>
    <gmd:MD_Distribution>
      <gmd:distributionFormat>
        <gmd:MD_Format>
          <gmd:name>
            <gco:CharacterString>Unknown</gco:CharacterString>
          </gmd:name>
          <gmd:version>
            <gco:CharacterString>Unknown</gco:CharacterString>
          </gmd:version>
        </gmd:MD_Format>
      </gmd:distributionFormat>
      <gmd:transferOptions>
        <gmd:MD_DigitalTransferOptions>
          <gmd:onLine>
            <gmd:CI_OnlineResource>
              <gmd:linkage>
                <gmd:URL>https://geoterritoires.hautsdefrance.fr/index.php#c=indicator&amp;i=emp_act.pt_pxx_factocc1564</gmd:URL>
              </gmd:linkage>
              <gmd:protocol>
                <gco:CharacterString>WWW:LINK-1.0-http--link</gco:CharacterString>
              </gmd:protocol>
              <gmd:description>
                <gco:CharacterString>Consulter sur Géoclip</gco:CharacterString>
              </gmd:description>
              <gmd:function>
                <gmd:CI_OnLineFunctionCode codeListValue="information" codeList="http://standards.iso.org/ittf/PubliclyAvailableStandards/ISO_19139_Schemas/resources/Codelist/ML_gmxCodelists.xml#CI_OnLineFunctionCode"/>
              </gmd:function>
            </gmd:CI_OnlineResource>
          </gmd:onLine>
        </gmd:MD_DigitalTransferOptions>
      </gmd:transferOptions>
      <gmd:transferOptions>
        <gmd:MD_DigitalTransferOptions>
          <gmd:onLine>
            <gmd:CI_OnlineResource>
              <gmd:linkage>
                <gmd:URL>https://geoterritoires.hautsdefrance.fr/GC_make_map.php?width=800&amp;format=svg&amp;lang=fr&amp;indics=emp_act.pt_pxx_factocc1564&amp;print=1&amp;cosmetics=1</gmd:URL>
              </gmd:linkage>
              <gmd:protocol>
                <gco:CharacterString>WWW:DOWNLOAD-1.0-http--download</gco:CharacterString>
              </gmd:protocol>
              <gmd:description>
                <gco:CharacterString>Impression de la carte vectorielle (SVG)</gco:CharacterString>
              </gmd:description>
              <gmd:function>
                <gmd:CI_OnLineFunctionCode codeListValue="download" codeList="http://standards.iso.org/ittf/PubliclyAvailableStandards/ISO_19139_Schemas/resources/Codelist/ML_gmxCodelists.xml#CI_OnLineFunctionCode"/>
              </gmd:function>
            </gmd:CI_OnlineResource>
          </gmd:onLine>
        </gmd:MD_DigitalTransferOptions>
      </gmd:transferOptions>
    </gmd:MD_Distribution>
  </gmd:distributionInfo>
  <gmd:dataQualityInfo>
    <gmd:DQ_DataQuality>
      <gmd:lineage>
        <gmd:LI_Lineage>
          <gmd:statement>
            <gco:CharacterString>Insee, RP exploitation principale</gco:CharacterString>
          </gmd:statement>
          <gmd:source/>
        </gmd:LI_Lineage>
      </gmd:lineage>
    </gmd:DQ_DataQuality>
  </gmd:dataQualityInfo>
  <gmd:metadataConstraints>
    <gmd:MD_Constraints>
      <gmd:useLimitation>
        <gco:CharacterString>n</gco:CharacterString>
      </gmd:useLimitation>
    </gmd:MD_Constraints>
  </gmd:metadataConstraints>
</gmd:MD_Metadata>
jahow commented 1 year ago

After investigating, it looks like the issue comes from the CSV service returning csw:Record elements instead of gmd:Metadata ones. If this is confirmed, this issue can be closed.