Remove solr specific suffixes

drh-stanford commented 8 years ago

Below is an example of a JSON-LD format for the GeoBlacklight schema that abstracts out the Solr specific details, and makes a couple other changes. Namely, this example uses @id in lieu of layer_slug_s, and dc:identifiers for a set of alternate identifiers and drops uuid ( #53 ). Note that dct:references becomes a proper JSON hash, and all derivative fields are dropped.

To ingest the abstracted JSON-LD format into a Solr index would require a shim of harvesting code that derives the fields needed for the Solr implementation (such as solr_geom's ENVELOPE syntax from the georss:box field). This harvesting code could also provide a conversion utility from the current version of the JSON schema and the 1.0 abstracted JSON-LD version.

There's several other issues with various individual fields, such as moving layer_id_s into dct:references #77, but the example below is meant to illustrate the JSON-LD file format and its implications as an interchange format.

The example shows that the JSON-LD'ness is pretty straightforward. Namely, the use of @context for the prefixes, and @id to identify the layer.

{
  "@context": {
    "dc": "http://purl.org/dc/elements/1.1/",
    "dct": "http://purl.org/dc/terms/",
    "georss": "http://georss.org#",
    "layer": "http://geoblacklight.org/schema/1.0#",
    "stanford": "http://library.stanford.edu#"
  },
  "@id": "stanford-fr148tw1471",
  "dc:identifier": [
    "http://purl.stanford.edu/fr148tw1471"
  ],
  "dc:title": "Geology: Offshore of Point Reyes, California, 2010",
  "dc:description": "This polygon shapefile represents geologic features within the offshore region of Point Reyes, California...",
  "dc:rights": "Public",
  "dct:provenance": "Stanford",
  "dct:references": {
    "http://schema.org/url": "http://purl.stanford.edu/fr148tw1471",
    "http://schema.org/downloadUrl": "http://stacks.stanford.edu/file/druid:fr148tw1471/data.zip",
    "http://www.loc.gov/mods/v3": "http://purl.stanford.edu/fr148tw1471.mods",
    "http://www.isotc211.org/schemas/2005/gmd/": "http://opengeometadata.stanford.edu/metadata/edu.stanford.purl/druid:fr148tw1471/iso19139.xml",
    "http://www.w3.org/1999/xhtml": "http://opengeometadata.stanford.edu/metadata/edu.stanford.purl/druid:fr148tw1471/default.html",
    "http://www.opengis.net/def/serviceType/ogc/wfs": "https://geowebservices.stanford.edu/geoserver/wfs",
    "http://www.opengis.net/def/serviceType/ogc/wms": "https://geowebservices.stanford.edu/geoserver/wms"
  },
  "layer:id": "druid:fr148tw1471",
  "layer:geom_type": "Polygon",
  "layer:modified_dt": "2016-02-05T22:07:10Z",
  "dc:format": "Shapefile",
  "dc:language": "English",
  "dc:type": "Dataset",
  "dc:publisher": "Geological Survey (U.S.)",
  "dc:creator": [
    "Michael W. Manson",
    "Janet T. Watt",
    "H. Gary Greene",
    "Moss Landing Marine Laboratories",
    "Pacific Coastal and Marine Science Center",
    "Golden, Nadine E."
  ],
  "dc:subject": [
    "Geology",
    "Geomorphology",
    "Sediments (Geology)",
    "Marine sediments",
    "Ocean bottom",
    "Geoscientific Information",
    "Oceans"
  ],
  "dct:issued": "2014",
  "dct:temporal": [
    "2006-2010"
  ],
  "dct:spatial": [
    "California",
    "Marin County (Calif.)",
    "Drakes Bay (Calif.)",
    "Pacific Ocean"
  ],
  "dc:relation": [
    "http://sws.geonames.org/3687919/",
    "http://sws.geonames.org/5370468/",
    "http://sws.geonames.org/8411083/"
  ],
  "georss:box": "37.939061 -123.091039 38.098269 -122.892843",
  "stanford:rights_metadata": "<?xml version=\"1.0\"?>\n<rightsMetadata>\n  <access type=\"discover\">\n    <machine>\n      <world/>\n    </machine>\n  </access>\n  <access type=\"read\">\n    <machine>\n      <world/>\n    </machine>\n  </access>\n  <use>\n    <human type=\"useAndReproduction\">This item is in the public domain.  There are no restrictions on use.</human>\n    <human type=\"creativeCommons\"/>\n    <machine type=\"creativeCommons\"/>\n  </use>\n  <copyright>\n    <human>This work is in the Public Domain, meaning that it is not subject to copyright.</human>\n  </copyright>\n</rightsMetadata>\n"
}

eliotjordan commented 8 years ago

I like seeing this as JSON-LD. Thanks for getting this up @drh-stanford!

mejackreed commented 8 years ago

Yes thanks, looks good! One quick concern I have is increasing the complexity of indexing documents from their native format. Maybe we can use something from here: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-TransformingandIndexingCustomJSON ?

Though it does seem like this might not fully meet our need, but the XML approach seems more amenable, as you can provide custom xslt's to transform your data. Sigh.

mejackreed commented 8 years ago

Also maybe the Data Import Handlers (DIH) are an option?

drh-stanford commented 8 years ago

The layer:id probably should move into the dct:references since it's not really an "identifier" as much as it's a parameter to the WMS/WFS protocol.

mejackreed commented 8 years ago

Not to throw a wrench in things, but we should possible talk about DCAT as an alternative too! https://project-open-data.cio.gov/v1.1/schema/

geoblacklight / geoblacklight-schema

Remove solr specific suffixes #80