datawagovau / pandora

Opening a CKAN of government data
2 stars 0 forks source link

Implement SLIP harvesting #10

Closed florianm closed 8 years ago

florianm commented 9 years ago

Implement WMS harvesting of SLIP endpoints following Keith's example

keithmoss commented 9 years ago
Dumb reverse proxy to Classic to work around the "pycsw doesn't do secured services" thing

http://gis.stackexchange.com/questions/103191/getting-error-after-trying-to-harvest-from-geoserver-into-pycsw

python bin/pycsw-admin.py -c post_xml -u http://localhost:8000/pycsw/csw.py -x /usr/lib/ckan/default/src/pycsw/post-slip-classic-rp.xml

post-slip-classic-rp.xml https://gist.github.com/keithmoss/888dcfef3a53ffead18e

florianm commented 9 years ago

fixed "clear harvest source" bug and submitted as https://github.com/ckan/ckanext-harvest/pull/152

keithmoss commented 9 years ago

Would need to dedupe SLIP Classic WMS/WFS because they're separate services.

See https://www2.landgate.wa.gov.au/web/guest/57;jsessionid=39A845177E1155A4CAA09A7D0A33E450 and https://www2.landgate.wa.gov.au/web/guest/subscription1.

A couple of days work for Florian to create a little custom harvester if there's no nice way of doing it via PyCSW.

florianm commented 9 years ago

incited datacats to provide their own docker image for pycsw for simplicity. https://github.com/datacats/datacats/issues/301

keithmoss commented 9 years ago

This was the code I commented out to hack around the validation issues (just to get something working)

https://github.com/ckan/ckanext-spatial/blob/master/ckanext/spatial/harvesters/base.py#L474-L482

keithmoss commented 9 years ago

https://gist.github.com/keithmoss/745c286a8ee6c72da601

florianm commented 9 years ago

The simple fix for a preview of course is to show SLIP as one single dataset with a working preview. This requires Keith's magic reverse proxy which provides authentication using an existing account to access the public SLIP WMS.

e.g. on the home page (layout 1): http://catalogue-beta.data.wa.gov.au/ as "featured resource"

or as resource http://catalogue-beta.data.wa.gov.au/dataset/slip-classic/resource/fa1652e0-9782-40d1-b74e-342c367cc8a7

florianm commented 9 years ago

SLIP Classic harvesting works (Python script). Some fields need fine tuning. Will add WMS url as resource.

florianm commented 9 years ago

SLIP Classic harvesting now with added WMS urls as resources. @keithm should this be the proxied URL or this one https://www2.landgate.wa.gov.au/ows/wmspublic?

florianm commented 9 years ago

SLIP Future, ArcGIS REST endpoint harvester: https://github.com/GSA/ckanext-geodatagov

keithmoss commented 9 years ago

Questions

Harvest multiple endpoints

How much work would be required to harvest from multiple SLIP endpoints and dedupe so datasets present with multiple resources?

e.g. LGATE-001 - Cadastre (No Attributes) will be present in wmspublic, wmsCsCadastre, and wfsCsCadastre

Handling of secured WMS/WFS services

Consider how we handle the user experience of accessing a WMS link and getting prompted for auth.

Consider how WMS/WFS preview are done given auth requirement. Is preview desirable or required?

Decision

Permit manual editing of metadata

Decision

We'll run it once and never again. After that we can edit to our heart's content.

Required Functionality

Change owning organisation to SLIP

Otherwise it looks like Landgate is the custodian.

Change default harvested dataset description

Keith to provide words that briefly explain SLIP, talks about access and having to sign up, et cetera.

Documentation

Desirable Functionality

Prettier dataset names, dynamically assign some metadata values

Remove extraneous information from SLIP harvested layer names.

e.g.

Public Transport Authority Services (Pta-007) (24-08-2015 12:50:09) should be Public Transport Authority Services

Additionally:

Dynamically assign custodian metadata

Keith to supply agency acronym to full name mapping.

May need to be smart and default back to the acronym if not all mappings are available.

Dynamically attach data dictionaries.

http://slip.landgate.wa.gov.au/Pages/Data-Dictionary.aspx

If not: Do by hand?

keithmoss commented 9 years ago

List of data dictionaries

data-dictionaries-csv.txt

keithmoss commented 9 years ago

agency-name-mappings.xlsx

keithmoss commented 9 years ago

Description template:

This dataset has been sourced from Landgate's Shared Location Information Platform (SLIP) - the home for Western Australian government geospatial data. Many of the datasets in SLIP are free and publicly available to users who simply [sign up for a SLIP account](https://www2.landgate.wa.gov.au/web/guest/request-registration-type).

Find our more about SLIP at [http://slip.landgate.wa.gov.au/](http://slip.landgate.wa.gov.au/).

{UNIQUE ID HERE}

Suggest no landing page in the case of SLIP, as we already link to it.

florianm commented 9 years ago

WMS harvester script

TODO

https://www2.landgate.wa.gov.au/ows/wmspublic

https://www2.landgate.wa.gov.au/ows/wfspublic_4326

https://www2.landgate.wa.gov.au/ows/wfsCsAdmin_4283/wfs

https://www2.landgate.wa.gov.au/ows/wmsCsCadastre

https://www2.landgate.wa.gov.au/ows/wfsCsCadastre_4283

https://www2.landgate.wa.gov.au/ows/wmsCsMosaic

https://www2.landgate.wa.gov.au/ows/wfsCsTopo_4283

Current sandbox: http://waitwaitboom.alpha.data.wa.gov.au/ Current "clean demo": http://catalogue.alpha.data.wa.gov.au/

resource title: OGC Web Map Service Learn how to access this resource URL in a GIS (e.g. QGIS or ArcGIS) with your SLIP credentials.

florianm commented 8 years ago

wms/wfs public/cadastre are harvested into alpha, nmap preview on by default, zooms to wms layers and loads them automatically

harvesting to do: virtual mosaic (one layer), wfsCsAdmin (no corresponding WMS), wfsCsTopo(url broken)

documenting to do: once harvesting finished, clean up ipy notebook and publish to alpha and github

florianm commented 8 years ago

SLIP classic is harvested and resources are deduped. Harvesting script is now ready to accept SLIP Future layers.

keithamoss commented 8 years ago

👍

florianm commented 8 years ago

SLIP classic harvesting is at 90% - there are some layers with exceptional names that are not picked up by the harvesting script. As discussed, let's leave SLIP classic at that stage until further funding is secured for the last 10%.

Suggesting to close, feel free to re-open.