OpenGeoMetadata / GeoCombine

A Ruby toolkit for managing geospatial metadata
https://github.com/OpenGeoMetadata/GeoCombine
Other
21 stars 24 forks source link

GeoCombine should be able to convert from metadata schemas to a Solr schema #1

Closed mejackreed closed 1 year ago

mejackreed commented 9 years ago

Ingest ISO19139 or FGDC and validate them. Enable to_geoblacklight and validate for ISO19139 and FGDC

# Input an ISO 19139 record
iso_record = GeoCombine::Record.new(iso19139_metadata)
# Validate the record
iso_record.validate # returns true
# Convert the record
iso_record.to_geoblacklight # Converts a record to GeoBlacklight-Schema

# Input a FGDC record
fgdc_record = GeoCombine::Record.new(fgdc_metadata)
#Validate the record
fgdc_record.validate # returns true
# Convert the record
fgdc_record.to_geoblacklight # Returns an object with formatted fields, could do to_json or to_xml
drh-stanford commented 9 years ago

We need support for external references for WMS, etc. as the ISO and FGDC do not support such links. Not sure exactly how to do this yet.

Also, see https://github.com/geoblacklight/geoblacklight-schema/tree/master/tools/solr for solr ingest code

And see https://github.com/geoblacklight/geoblacklight-schema/issues/6 for prior FGDC attempts.

chrissbarnett commented 9 years ago

Darren,

The OGP Metadata Working group has a sub-group addressing this issue currently.

Front-running ideas for FGDC involve some sort of standard encoding in the section or in the distribution section.

For ISO 19139 metdata I have seen this fairly commonly:

gmd:distributionInfo gmd:MD_Distribution gmd:transferOptions gmd:MD_DigitalTransferOptions gmd:onLine gmd:CI_OnlineResource gmd:linkage gmd:URLhttp://www.fao.org/figis/geoserver/species/ows?SERVICE=WMS/gmd:URL /gmd:linkage gmd:protocol gco:CharacterStringOGC:WMS-1.3.0-http-get-map/gco:CharacterString /gmd:protocol gmd:name gco:CharacterStringSPECIES_DIST_ALK/gco:CharacterString /gmd:name gmd:description gco:CharacterStringFAO aquatic species distribution map of Theragra chalcogramma/gco:CharacterString /gmd:description /gmd:CI_OnlineResource /gmd:onLine gmd:onLine gmd:CI_OnlineResource gmd:linkage gmd:URLhttp://www.fao.org/fishery/collection/fish_dist_map/en/gmd:URL /gmd:linkage gmd:protocol gco:CharacterStringWWW:LINK-1.0-http--link/gco:CharacterString /gmd:protocol gmd:description gco:CharacterStringCompilation of aquatic species distribution maps of interest to fisheries/gco:CharacterString /gmd:description /gmd:CI_OnlineResource /gmd:onLine gmd:onLine gmd:CI_OnlineResource gmd:linkage gmd:URLhttp://www.fao.org/figis/geoserver/species/ows?service=WFS/gmd:URL /gmd:linkage gmd:protocol gco:CharacterStringOGC:WFS-1.0.0-http-get-feature/gco:CharacterString /gmd:protocol gmd:name gco:CharacterStringSPECIES_DIST_ALK/gco:CharacterString /gmd:name gmd:description gco:CharacterStringFAO aquatic species distribution map of Theragra chalcogramma/gco:CharacterString /gmd:description /gmd:CI_OnlineResource /gmd:onLine gmd:onLine gmd:CI_OnlineResource gmd:linkage gmd:URLhttp://www.fao.org/geonetwork/srv/en/csw?service=CSW&request=GetRecordById&Version=2.0.2&elementSetName=full&outputSchema=http://www.isotc211.org/2005/gmd&id=3c002d93-f586-48f6-be9e-ffe099f55a1b/gmd:URL /gmd:linkage gmd:protocol gco:CharacterStringWWW:LINK-1.0-http--link/gco:CharacterString /gmd:protocol gmd:name gco:CharacterStringXML/gco:CharacterString /gmd:name gmd:description gco:CharacterStringmetadata (XML)/gco:CharacterString /gmd:description /gmd:CI_OnlineResource /gmd:onLine gmd:onLine gmd:CI_OnlineResource gmd:linkage gmd:URLhttp://www.fao.org/figis/geoserver/factsheets/species.html/gmd:URL /gmd:linkage gmd:protocol gco:CharacterStringWWW:LINK-1.0-http--link/gco:CharacterString /gmd:protocol gmd:description gco:CharacterStringAquatic Species Distribution Map Viewer/gco:CharacterString /gmd:description /gmd:CI_OnlineResource /gmd:onLine /gmd:MD_DigitalTransferOptions /gmd:transferOptions /gmd:MD_Distribution /gmd:distributionInfo

On Feb 5, 2015, at 1:33 PM, Darren Hardy notifications@github.com wrote:

We need support for external references for WMS, etc. as the ISO and FGDC do not support such links. Not sure exactly how to do this yet.

Also, see https://github.com/geoblacklight/geoblacklight-schema/tree/master/tools/solr https://github.com/geoblacklight/geoblacklight-schema/tree/master/tools/solr for solr ingest code

And see geoblacklight/geoblacklight-schema#6 https://github.com/geoblacklight/geoblacklight-schema/issues/6 for prior FGDC attempts.

— Reply to this email directly or view it on GitHub https://github.com/OpenGeoMetadata/GeoCombine/issues/1#issuecomment-73100078.

drh-stanford commented 9 years ago

Good to know! Are the gmd:protocol tags using a controlled vocabulary? If so, we could map them into the geoblacklight schema with some success (e.g., WWW:LINK-1.0-http--link to http://schema.org/url, etc), though multiple links may be problematic as in your example. @kimdurante from Stanford may already be working in the sub-group.

chrissbarnett commented 9 years ago

It does seem to be an attempt at a controlled vocabulary in this case, but knowing how the UN operates, this may be FAO only. Given that the protocol is defined in a CharacterString, probably not. In my experience, it seems to be something of a crap shoot.

Generally, I make a best-effort attempt to parse the url itself to derive the protocol type. I hope we can do a little better within institutions we’re working together with, but for non-OGP/GeoBlacklight organizations you take what you can! On the plus side, I think that 19115-3 is better.

Once the metadata working group decides on a standard, I’ll repopulate our FGDC metadata with OGC protocol links.

Are there other pieces of the GeoBlacklight schema not encoded in FGDC and/or ISO 19139 in a straight-forward way? Ideally “Institution” and access/use constraints should be encoded in machine readable way, I think.

On Feb 5, 2015, at 2:44 PM, Darren Hardy notifications@github.com wrote:

Good to know! Are the gmd:protocol tags using a controlled vocabulary? If so, we could map them into the geoblacklight schema with some success (e.g., WWW:LINK-1.0-http--link to http://schema.org/url, etc), though multiple links may be problematic as in your example. @kimdurante https://github.com/kimdurante from Stanford may already be working in the sub-group.

— Reply to this email directly or view it on GitHub https://github.com/OpenGeoMetadata/GeoCombine/issues/1#issuecomment-73112803.

drh-stanford commented 9 years ago

The GeoBlacklight schema is largely Dublin Core and thus has its semantics. There are a couple exceptions where we further enforce our controlled vocabulary and data structure (for dct_references) as DC is largely silient on the format of values.

Documentation and validation of the schema is an on-going problem. See https://github.com/geoblacklight/geoblacklight-schema/pull/33 for our first attempt at a validator with built-in documentation. Otherwise, it's been a code4lib journal paper and lots of examples. The dct_references is by far the most complicated piece, but we have other problems with a lack of controlled vocabularies in dc_subject and dct_spatial (place names). Kim and I wrote a manuscript that describes our various challenges on the metadata front. I will email you a copy for review.

We have various heuristics to map ISO metadata into our GeoBlacklight controlled vocabulary for dc_rights (just Public or Restricted) and layer_geom_type (Polygon, Raster, etc). UUIDs is another area we rely on, but FGDC is missing them from my understanding.

kimdurante commented 9 years ago

The MWG has not addressed this issue of controlled protocols to the point where we have created any sort of formalized list. This is somewhat of a known issue, and the CatInterop group attempted to define these : https://github.com/OSGeo/Cat-Interop/blob/master/LinkPropertyLookupTable.csv But I don't think anything was settled.

The other options I've seen for standardizing protocols is something like this list. See: "Protocol enumeration and description" https://github.com/geopython/pycsw/wiki/Geonode-notes

Others have suggested the use of urns for these in order to make them more readable outside of characterStrings, something like:

urn:ogc:serviceType:WebMapService:1.1.1:HTTP but I don't think these have been formalized. If you would like me to bring this up as an issue in our next MWG, I'm happy to do so.

drh-stanford commented 9 years ago

We have some additional documentation now on dct_references_s. See

http://geoblacklight.org/tutorial/2015/02/09/geoblacklight-overview.html

mejackreed commented 9 years ago

Just a heads up, new version of GeoCombine v0.0.2 starts to handle some of this using XSL transforms that @kimdurante worked on. FGDC support is a bit rough still.

https://github.com/OpenGeoMetadata/GeoCombine#transforming-metadata

drh-stanford commented 9 years ago

We're going to need some kind of external.json metadata that includes the dct_references_s links which ISO/FGDC won't be able to encode cleanly, and other metadata like Public vs. Restricted. I'll work on coming up with a complete list of metadata needed by external.json (or some better name).

drh-stanford commented 9 years ago

See https://github.com/OpenGeoMetadata/metadatarepository/issues/13

thatbudakguy commented 1 year ago

I think it is safe to say we do this now :)