Open pvgenuchten opened 9 years ago
If we are thinking on crawlers we will probably need some kind of list (rss?) for schema.org formatted data, right?
And link that list from the robots.txt
@delawen, You mean a startpoint of the crawl (bootstrap)? The geonetwork sitemap will do, make sure it supports pagination though
Please mind this initiative: http://www.w3.org/wiki/WebSchemas/Datasets and http://blog.schema.org/2012/07/describing-datasets-with-schemaorg.html to see if we can use or contribute to some of their work. Discussion moved to github, for example https://github.com/schemaorg/schemaorg/issues/713, https://github.com/schemaorg/schemaorg/issues/688, https://github.com/schemaorg/schemaorg/issues/583, https://github.com/schemaorg/schemaorg/issues/113
Links of interest:
http://schema.org/docs/full.html http://schema.org/DataCatalog: not sure if useful to define the GeoNetwork instance or just a very big dataset. http://schema.org/Dataset http://schema.org/DataDownload http://schema.org/Map
Search engines use the http://schema.org vocab to analyse content that they crawl. To make search engines understand iso19139 a mapping to schema.org should be made available. For those iso19139 aspects currently not available in schema.org, we can suggest an extension of schema.org.
This mapping can be implemented in GeoNetwork in 2 ways:
Option 1 may over time be the best option, however may have too much impact in the scope of the current testbed.
There is a number of ways to expose schema.org so it can be ingested by search engines.
The advantage of options 1 & 2 is that users using a webbrowser to browse the web will see attractive content once they click a search-result in a search engine. The advantage for 3 is that webdevelopers can use the API to develop applications to.
Content negotiation will guide webbrowsers to html representations and machines to json representations of the documents