cioos-siooc / ckan

CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers datahub.io, catalog.data.gov and europeandataportal.eu/data/en/dataset among many other sites.
http://ckan.org/
Other
2 stars 4 forks source link

look into options for harvesting metadata from GLOS #162

Open fostermh opened 2 years ago

fostermh commented 2 years ago

GLOS's new seagull application is built on ERSI products and thus is using geoportal as it's backend metadata server. metadata records can be requested using the standard geoportal rest API. for example:

https://seagull-geoportal.glos.org/geoportal/rest/metadata/search?start=10&num=1&searchText=sys.schema.key:iso19115-2

Note that while in this example I am requesting the metadata in iso19115-2 format and is under the hits/hits[]/_source/sys_xml_clob json path. The rest of the returned json is the same as if we had added 'f=pjson' to the query string. requesting metadata in XML format ('f=xml') results in RSS feed XML which is not what we are looking for.

I have tried every incarnation of date range search I can find but non of them work on this instance of geoportal. It seems like square brackets are not allowed and non of the DATE casts seem to work

fostermh commented 1 year ago

glos csw service is available here: https://seagull-geoportal.glos.org/geoportal/csw?request=GetCapabilities&service=CSW&version=3.0.0

fostermh commented 1 year ago

this appears to give the xml https://seagull-geoportal.glos.org/geoportal/rest/metadata/item/e05de7ae36ee459395ba2d9b9e39e3dd/xml so if we can get a list of dataset id's we could pull the xml in a standard way using a waf harvester

fostermh commented 1 year ago

trying to harvest using CSW seems to be broken, I get 'Error gathering the identifiers from the CSW server [Document is XML.

fostermh commented 1 year ago

it looks like the following query will give us the records modified since this date. https://seagull-geoportal.glos.org/geoportal/opensearch?q=&modified=2023-01-20/*&f=json

fostermh commented 1 year ago

useful api docs https://github.com/Esri/geoportal-server-catalog/blob/master/geoportal/doc/api.txt