OpenGeoMetadata / GeoCombine

A Ruby toolkit for managing geospatial metadata
https://github.com/OpenGeoMetadata/GeoCombine
Other
20 stars 23 forks source link

Schema v1 to Aardvark migrator #143

Closed thatbudakguy closed 5 months ago

thatbudakguy commented 1 year ago

Closes #121

thatbudakguy commented 1 year ago

Making this a draft again pending discussion of behavior for some fields; see https://github.com/OpenGeoMetadata/metadata-issues/issues/50

the-codetrane commented 11 months ago

The path in lib/geo_combine/geoblacklight.rb:16changed to "https://raw.githubusercontent.com/OpenGeoMetadata/opengeometadata.github.io/main/docs/schema/geoblacklight-schema-#{GEOBLACKLIGHT_VERSION}.json"

the-codetrane commented 10 months ago

@thatbudakguy any chance we can get solr_geom to dcat_bbox added to this? This is otherwise working image

thatbudakguy commented 9 months ago

@the-codetrane thx for pointing that out; I added a step to handle dcat_bbox. This PR is now blocked by #162.

the-codetrane commented 8 months ago

@thatbudakguy found another key that could be migrated - layer_geom_type_s to gbl_resourceType_sm. The crosswalk documentation has them as deprecated/new fields, but it would appear they are in fact related.

thatbudakguy commented 8 months ago

there's code in this PR to do that – we use a lookup table to map geometry types to resources types. it's only straightforward for a few cases, imo. does it not work for you?

the-codetrane commented 8 months ago

This is what comes out when I run the migrator on a GBL 1.0 schema record:

{
  "dct_description_sm": [
    "This polygon shapefile represents the 1964 County Boundaries for China. The layer includes population census data and was primarily based on the \"Historical Administrative Maps of the People's Republic of China,\" published by China Map Press, and some other yearly administrative maps. See the documentation for more information and a list of the layer variables."
  ],
  "dct_format_s": "Shapefile",
  "dct_identifier_sm": [
    "http://hdl.handle.net/2451/34626"
  ],
  "dct_language_sm": [
    "English"
  ],
  "dct_publisher_sm": [
    "Beijing Hua tong ren shi chang xin xi you xian ze ren gong si"
  ],
  "dc_relation_sm": [
    "http://sws.geonames.org/1814991/about/rdf"
  ],
  "dct_accessRights_s": "Restricted",
  "dct_subject_sm": [
    "Boundaries",
    "Demographic surveys",
    "Population"
  ],
  "dct_title_s": "1964 County Boundaries of China with Population Census Data",
  "dc_type_s": "Dataset",
  "dct_isPartOf_sm": [
    "Historical China County Population Census Data"
  ],
  "dct_issued_s": "2005",
  "schema_provider_s": "NYU",
  "dct_references_s": "{\"http://schema.org/url\":\"http://hdl.handle.net/2451/34626\",\"http://www.opengis.net/def/serviceType/ogc/wfs\":\"https://maps-restricted.geo.nyu.edu/geoserver/sdr/wfs\",\"http://www.opengis.net/def/serviceType/ogc/wms\":\"https://maps-restricted.geo.nyu.edu/geoserver/sdr/wms\",\"http://schema.org/downloadUrl\":\"https://archive.nyu.edu/retrieve/74851/nyu_2451_34626.zip\",\"http://lccn.loc.gov/sh85035852\":\"https://archive.nyu.edu/retrieve/74896/nyu_2451_34626_doc.zip\"}",
  "dct_spatial_sm": [
    "People's Republic of China, China"
  ],
  "dct_temporal_sm": [
    "1964"
  ],
  "gbl_mdVersion_s": "Aardvark",
  "layer_geom_type_s": "Polygon", // I'M GUESSING THIS IS SUPPOSED TO BE SOMETHING ELSE?
  "gbl_wxsIdentifier_s": "sdr:nyu_2451_34626",
  "gbl_mdModified_dt": "2016-11-10T15:51:38Z",
  "id": "nyu-2451-34626",
  "nyu_addl_dspace_s": "35559",
  "locn_geometry": "ENVELOPE(73.557693, 134.773911, 53.56086, 10.175472)",
  "gbl_indexYear_im": [
    1964
  ],
  "nyu_addl_format_sm": [
    "Shapefile"
  ],
  "_version_": 1779481613907787776,
  "timestamp": "2023-10-11T17:38:31.500Z"
}
srappel commented 8 months ago

"layer_geom_type_s": "Polygon", // I'M GUESSING THIS IS SUPPOSED TO BE SOMETHING ELSE?

I would expect "gbl_resourceType_sm": "Polygon Data" according to the controlled vocab

thatbudakguy commented 5 months ago

@the-codetrane can you share the record that you transformed to get that output?

the-codetrane commented 5 months ago

@thatbudakguy My contract at NYU ended, so I'm outside the walled garden. @mnyrop should be able to help you with this.

thatbudakguy commented 5 months ago

OK, I found the record. I ran it through the migrator myself and got:

{
  "dct_creator_sm": [],
  "dct_description_sm": [
    "This polygon shapefile represents the 1964 County Boundaries for China. The layer includes population census data and was primarily based on the \"Historical Administrative Maps of the People's Republic of China,\" published by China Map Press, and some other yearly administrative maps. See the documentation for more information and a list of the layer variables."
  ],
  "dct_format_s": "Shapefile",
  "dct_identifier_sm": ["http://hdl.handle.net/2451/34626"],
  "dct_language_sm": ["English"],
  "dct_publisher_sm": [
    "Beijing Hua tong ren shi chang xin xi you xian ze ren gong si"
  ],
  "dc_relation_sm": ["http://sws.geonames.org/1814991/about/rdf"],
  "dct_accessRights_s": "Restricted",
  "dct_subject_sm": ["Boundaries", "Demographic surveys", "Population"],
  "dct_title_s": "1964 County Boundaries of China with Population Census Data",
  "dct_issued_s": "2005",
  "schema_provider_s": "NYU",
  "dct_references_s": "{\"http://schema.org/url\":\"http://hdl.handle.net/2451/34626\",\"http://www.opengis.net/def/serviceType/ogc/wfs\":\"https://maps-restricted.geo.nyu.edu/geoserver/sdr/wfs\",\"http://www.opengis.net/def/serviceType/ogc/wms\":\"https://maps-restricted.geo.nyu.edu/geoserver/sdr/wms\",\"http://schema.org/downloadUrl\":\"https://archive.nyu.edu/retrieve/74851/nyu_2451_34626.zip\",\"http://lccn.loc.gov/sh85035852\":\"https://archive.nyu.edu/retrieve/74896/nyu_2451_34626_doc.zip\"}",
  "dct_spatial_sm": ["People's Republic of China, China"],
  "dct_temporal_sm": ["1964"],
  "gbl_mdVersion_s": "Aardvark",
  "gbl_wxsIdentifier_s": "sdr:nyu_2451_34626",
  "gbl_mdModified_dt": "2016-11-10T15:51:38Z",
  "id": "nyu-2451-34626",
  "nyu_addl_dspace_s": "35559",
  "dcat_bbox": "ENVELOPE(73.557693, 134.773911, 53.56086, 10.175472)",
  "gbl_indexYear_im": [1964],
  "gbl_resourceClass_s": ["Datasets"],
  "gbl_resourceType_s": ["Polygon data"]
}

It turned out there was just a typo; the new field is gbl_resourceType_sm (not gbl_resourceType_s), as it's multi-valued. Otherwise, the conversion works as expected (it outputs Polygon data and the original field is stripped).

I've corrected the mistake.

karenmajewicz commented 5 months ago

Resource Class is also multivalued: gbl_resourceClass_sm