geonetwork / core-geonetwork

GeoNetwork is a catalog application to manage spatially referenced resources. It provides powerful metadata editing and search functions as well as an interactive web map viewer. It is currently used in numerous Spatial Data Infrastructure initiatives across the world.
http://geonetwork-opensource.org/
GNU General Public License v2.0
432 stars 489 forks source link

Add schema.org JSON-LD into HTML for SEO #3820

Closed nmtoken closed 5 years ago

nmtoken commented 5 years ago

Is your feature request related to a problem? Please describe. To aid data set (and services) discovery, GeoNetwork HTML pages should include a JSON-LD summary.

Describe the solution you'd like

For example looking at the CKAN page: http://data.jrc.ec.europa.eu/dataset/fe1878e8-7541-4c66-8453-afdae7469221 the page source includes:

<!-- Snippet package/snippets/schemaorg_indexing.html start -->
<script type="application/ld+json">

{
  "@context":"http://schema.org",
  "@type":"Dataset",
  "@id":"http://data.europa.eu/89h/fe1878e8-7541-4c66-8453-afdae7469221",
  "publishingPrinciples":"https://doi.org/10.2788/607378",
  "name":"Rivers and Catchments of Europe - Catchment Characterisation Model (CCM)",
  "description":"The Catchment Characterisation Model (CCM2) database covers the entire European continent, including the Atlantic islands, Iceland and Turkey. It includes a hierarchical set of river segments and catchments based on the Strahler order, a lake layer and structured hydrological feature codes based on the Pfafstetter system. It allows for analysis from the regional to the continental scale, corresponding to traditional mapping scales of up to 1:500,000. CCM2 covers an area of about 12,000,000 square kilometers and includes more than 2,000,000 primary catchments. These can be aggregated to drainage basins at different hierarchical levels, forming, for example, about 650 river basins of more than 1000 square kilometers. CCM2 further includes a coastline, fully congruent with the river basins, and some 70,000 lakes.  The layers are generated from a 100 meters resolution digital terrestrial elevation model. The following layers are available: Seaoutlets: the major river basins, Main drains: the major rivers, Lakes: all surface water larger than 25x25 metres, Coastlines: coast line extracted from Image2000 imagery, River segments: Drainage channels from the primary catchments, Catchments: Primary catchments. This data-set refers to Lakes and River basins.",
  "publisher":{"@type":"Organization","name":"European Commission, Joint Research Centre (JRC)","url":"https://ec.europa.eu/jrc/"},
  "datePublished" : "2007-06-01",
  "url":"http://data.europa.eu/89h/fe1878e8-7541-4c66-8453-afdae7469221",
  "sameAs":"http://data.europa.eu/89h/fe1878e8-7541-4c66-8453-afdae7469221",
  "keywords":"Feature coding; Hydrography; River network; Seaoutlet; catchments; drainage basins; river basin development; river management; 36 SCIENCE; hydrology; Environment; Science and technology",
  "spatialCoverage":[{"@type":"Place","geo":{"@type":"GeoShape","box":"27.0 -32.0 72.0 61.0"}}],
  "creator":[{
      "@type":"Person",
      "givenName":"Alfred",
      "familyName":"De Jager",
      "name":"De Jager, Alfred"
    },{
      "@type":"Person",
      "givenName":"Jürgen",
      "familyName":"Vogt",
      "name":"Vogt, Jürgen"
    }],
  "distribution":[{
      "@type":"DataDownload",
      "name":"CCM view service",
      "description":"Web Map Service (WMS) - GetCapabilities",
      "fileFormat":"wms",
      "contentUrl":"http://edo.jrc.ec.europa.eu/mapserv/mapserv?map=/var/www/ccm/site/water/viewer/mapfiles/ccm.map&amp;version=1.3.0&amp;service=WMS&amp;request=GetCapabilities"
    },{
      "@type":"DataDownload",
      "name":"CCM landing page",
      "description":"General description of Catchment Characterisation and Modelling (CCM) activity and access to the datasets download page. Data access requires registration.",
      "fileFormat":"Esri File Geodatabase",
      "contentUrl":"http://ccm.jrc.ec.europa.eu/"
    }],
  "citation":[{
      "@type":"Article",
      "name":"A pan-European river and catchment database",
      "identifier":"10.2788/35907",
      "@id":"https://doi.org/10.2788/35907",
      "url":"https://doi.org/10.2788/35907"
    }],  
  "includedInDataCatalog":{"@type":"DataCatalog","name":"JRC Data Catalogue","url":"https://data.jrc.ec.europa.eu/"}
}

or for example in data.gov.uk (also CKAN?)

https://data.gov.uk/dataset/220edaa9-50d8-4b44-9c8f-ad7ad7125f98/bgs-geology-50k-digmapgb-50-mass-movement-version-8

has in page source:

<script type="application/ld+json">
 {"@context":"http://schema.org","@type":"Dataset","name":"BGS Geology - 50k (DiGMapGB-50) Mass Movement version 8","url":"https://data.gov.uk/dataset/220edaa9-50d8-4b44-9c8f-ad7ad7125f98/bgs-geology-50k-digmapgb-50-mass-movement-version-8","includedInDataCatalog":{"@type":"DataCatalog","url":"https://data.gov.uk"},"creator":{"@type":"Organization","name":"British Geological Survey"},"description":"Data identifying landscape areas (shown as polygons) attributed with type of mass movement e.g. landslip. The scale of the data is 1:50 000 scale. Onshore coverage is provided for all of England, Wales, Scotland and the Isle of Man. Mass movement describes areas where deposits have moved down slope under gravity to form landslips. These landslips can affect bedrock, superficial or artificial ground. Mass movement deposits are described in the BGS Rock Classification Scheme Volume 4. However the data also includes foundered strata, where ground has collapsed due to subsidence (this is not described in the Rock Classification Scheme). Caution should be exercised with this data; historically BGS has not always recorded mass movement events and due to the dynamic nature of occurrence significant changes may have occurred since the data was released. The data are available in vector format (containing the geometry of each feature linked to a database record describing their attributes) as ESRI shapefiles and are available under BGS data licence.","license":{"@type":"CreativeWork","name":null,"text":"The dataset is made available to external clients under BGS Digital Data Licence terms and conditions. Revert to the IPR Section (DigitalLE@bgs.ac.uk) if further advice is required with regard to permitted usage.","url":null},"dateModified":"2017-01-18T11:02:22.087Z","keywords":"Environment","distribution":[{"@type":"DataDownload","contentUrl":"http://www.bgs.ac.uk/products/digitalmaps/DiGMapGB_50.html","fileFormat":"","name":"information"}]}
</script>

Describe alternatives you've considered Create separate web pages outside of GeoNetwork

Additional context Such a feature would aid discovery of spatial data through mainstream search engines, and is inline with the W3C Spatial Data on the Web Best Practices.

Also relevant: Workshop on making spatial data discoverable through mainstream search engines 3-4 July 2019

fxprunayre commented 5 years ago

It is mainly done in https://github.com/geonetwork/core-geonetwork/pull/3714 but needs a review. If you have time, comments are welcomed.

heryk commented 5 years ago

FYI. Useful documentation - Google JSON-LD for datasets: https://developers.google.com/search/docs/data-types/dataset

pvgenuchten commented 5 years ago

up till now we adopted the schema.org microdata approach, but sure embedded json Ld is also an option. Read also https://www.geocat.net/nl/how-to-improve-the-discoverability-of-geonetwork-records-by-search-engines

fxprunayre commented 5 years ago

It is now available as a JSON-LD formatter and embedded in the advanced view. Please test and report any improvements in the JSON-LD mapping if needed. Closing this.

nmtoken commented 4 years ago

I saw this was implemented in 3.8.0 changelog (https://geonetwork-opensource.org/manuals/trunk/en/overview/change-log/version-3.8.0.html) but in 3.10.3.0 I can't see any schema.org/json-ld content, is that a bug, or a configuration issue,, or was the feature removed for some reason?

fxprunayre commented 4 years ago

3.10.3.0 I can't see any schema.org/json-ld content

Where ? If accessing a record landing page like https://vanilla.geocat.net/geonetwork/srv/api/records/da165110-88fd-11da-a88f-000d939bc5d8, the jsonld description should be embedded into the page.

nmtoken commented 4 years ago

OK, that works for me (not having the sitemap to find the link to full HTML page was an issue as per https://github.com/geonetwork/core-geonetwork/issues/4903).

I was looking in the full view of catalogue.search like http://localhost:8080/cat/srv/eng/catalog.search#/metadata/1503649d-6fdc-163a-e054-002128a47908/formatters/xsl-view?root=div&view=advanced not in the api/records URI http://localhost:8080/cat/srv/api/records/1503649d-6fdc-163a-e054-002128a47908