consbio / gis-metadata-parser

Parser for GIS metadata standards including ArcGIS, FGDC and ISO-19115
BSD 3-Clause "New" or "Revised" License
20 stars 3 forks source link

Add support for other ArcGIS standard keyword types #3

Closed dharvey-consbio closed 6 years ago

dharvey-consbio commented 6 years ago

We want to support at least the keyword types that are available in FGDC:

<!ELEMENT theme    (themekt, themekey+)>
<!ELEMENT place    (placekt, placekey+)>
<!ELEMENT stratum  (stratkt, stratkey+)>
<!ELEMENT temporal (tempkt, tempkey+)>

The full list of known ArcGIS types are:

<!ELEMENT discKeys (keyword+, thesaName?, thesaLang?)>
<!ELEMENT otherKeys (keyword+, thesaName?, thesaLang?)>
<!ELEMENT placeKeys (keyword+, thesaName?, thesaLang?)>
<!ELEMENT productKeys (keyword+, thesaName?, thesaLang?)>
<!ELEMENT searchKeys (keyword+, thesaName?, thesaLang?)>
<!ELEMENT stratKeys (keyword+, thesaName?, thesaLang?)>
<!ELEMENT subTopicCatKeys (keyword+, thesaName?, thesaLang?)>
<!ELEMENT tempKeys (keyword+, thesaName?, thesaLang?)>
<!ELEMENT themeKeys (keyword+, thesaName?, thesaLang?)>

In addition to the four FGDC keyword types, consider supporting searchKeys. This can be done in the ISO standard by utilizing MD_KeywordTypeCode (exactly as it is currently used for theme and place keys), and for cross-compatibility with FGDC by appending them to themekey on import.

Note: because there is no standard place to write search keys in FGDC, they will not convert losslessly back to ISO or ArcGIS after being converted to FGDC, and I don't feel good about making up a new FGDC tag just for this:

<!ELEMENT search (searchkt, searchkey+)>
dharvey-consbio commented 6 years ago

This has been delivered, and with it a rare bug fix. Now the following key word types are supported for all standards:

  1. Theme Keys
  2. Place Keys
  3. Stratum Keys
  4. Temporal Keys

In addition to that, these ArcGIS standard keyword types are supported, but not between standards:

  1. Discipline Keys
  2. Other Keys
  3. Product Keys
  4. Search Keys
  5. Topical Category Keys

The bug that was fixed was that, when extending a parser to support new fields, if one referenced an attribute (@value below) in the XML path for a custom property, the parser would only read the first occurrence of the attribute. Now the output will be as expected as demonstrated below:

from gis_metadata.arcgis_metadata_parser import ArcGISParser

class CustomAgisParser(ArcGISParser):
    def _init_data_map(self):
        super(CustomAgisParser, self)._init_data_map()

        self._data_map['topic_categories'] = 'dataIdInfo/tpCat/TopicCatCd/@value'
        self._data_map['_topic_categories_root'] = 'dataIdInfo/tpCat'

agis_metadata = """
<metadata xml:lang="en">
  <dataIdInfo>
    <tpCat>
      <TopicCatCd value="001"/>
    </tpCat>
    <tpCat>
      <TopicCatCd value="002"/>
    </tpCat>
  </dataIdInfo>
</metadata>
"""
agis_parser = CustomAgisParser(agis_metadata)
agis_parser.topic_categories  # ['001', '002']