ckan / ckanext-spatial

Geospatial extension for CKAN
http://docs.ckan.org/projects/ckanext-spatial
125 stars 192 forks source link

guess_resource_format may use protocol and function to better guess the resource type #262

Closed ccancellieri closed 2 years ago

ccancellieri commented 2 years ago

Issue

I'm experiencing failure in WMS recognition due to different kind of url.

As far as I see the guess_resource_format() only leverages on the 'url' to guess the type and this is quite often not matching since my url is no exposing url matches like geoserver/wms or service=wms (even if I'm using geoserver as wms ...)

Here is the current implementation:

    resource_types = {
        # OGC
        'wms': ('service=wms', 'geoserver/wms', 'mapserver/wmsserver', 'com.esri.wms.Esrimap', 'service/wms'),
        'wfs': ('service=wfs', 'geoserver/wfs', 'mapserver/wfsserver', 'com.esri.wfs.Esrimap'),
        'wcs': ('service=wcs', 'geoserver/wcs', 'imageserver/wcsserver', 'mapserver/wcsserver'),
        'sos': ('service=sos',),
        'csw': ('service=csw',),
        # ESRI
        'kml': ('mapserver/generatekml',),
        'arcims': ('com.esri.esrimap.esrimap',),
        'arcgis_rest': ('arcgis/rest/services',),
    }
    url = resource_locator.get('url').lower().strip()

    for resource_type, parts in resource_types.items():
        if any(part in url for part in parts):
            return resource_type
        ....

As you may see from resource_types dict the formats are matching against a pretty specific (and default) url.

In case of GeoServer for example it will only match if you keep the default web application path geoserver/wms which is not a real scenario, especially for production environments.

Proposal

Debugging I'm seeing that we can have several other informations from the resource_locator which may better drive the resource type inspection (I'm referring to protocol and function).

See below an example:

'function':'FUNCTION'
'description':'DESCRIPTION'
'name':'LAYERNAME'
'protocol':'OGC:WMS-1.1.1-http-get-map'
'url':'https://....../wms'

I'm going to propose a pull request to pass the whole resource_locator variable to the guess function so we can better guess the format, probably protocol would be enough to understand the type....

I would also suggest to pass (optionally) the full object in case some other information is needed on specific implementations (overrides).

This should be an extension point where the plugin can be configured to parse custom protocols or other and set the format accordingly.

Ref:

https://github.com/ckan/ckanext-spatial/blob/992b2753fc24d0abb12ced5cf5aaa3a853ca9ea4/ckanext/spatial/harvesters/base.py#L384

Specification

https://schemas.isotc211.org/19115/-1/cit/1.3.0/cit/#type_CI_OnlineResource_Type https://geonetwork-opensource.org/manuals/3.10.x/en/annexes/standards/iso19139.html#protocol