geopython / OWSLib

OWSLib is a Python package for client programming with Open Geospatial Consortium (OGC) web service (hence OWS) interface standards, and their related content models.
https://owslib.readthedocs.io
BSD 3-Clause "New" or "Revised" License
381 stars 273 forks source link

CSW client cannot handle outputscheme #605

Open nicholascar opened 4 years ago

nicholascar commented 4 years ago

When I call getrecords2 like this:

csw.getrecords2(
    startposition=startposition,
    maxrecords=pagesize,
    outputschema=outputschema,
    esn='full',
    sortby=sortby
)

no value for outputschema, other than http://www.isotc211.org/2005/gmd works, which is no good, since we are using 19115-1 (http://standards.iso.org/iso/19115/-3/mdb/1.0).

POSTing raw request to the CSW server (via Python requests), outputscheme='owl' works to give me the server's own output scheme but the OWSLib's csw client csw = CatalogueServiceWeb(url) can't split the returned XML into ows.records to allow iterating be ows.records.items().

I guess that internally, the ows client needs to split XML based on namespsaces it knows, from ISO19115:2005, not anything else, like ISO19115-1:2014. Perhaps it can't find out the location of UUID in the record since that's changes in -1:2014.

A workaround is to use 'raw' XML splitting on ows.response like this:

root = etree.fromstring(csw.response)
records = root.findall('.//mdb:MD_Metadata', namespaces=namespaces)
ccancellieri commented 2 years ago

The solution should be based on the getCapabilities response. If interested I can propose a PR on this quite easily...

To have an Idea:

  1. I would remove or at least never use a statically defined Namespace list, target server may not implement most or some of them.
  2. I would query the target server saving the capabilities in the current state (see self.capabilities below) or returning it to the client.
  3. I would provide a dynamically generated dict of namespaces handled (by operation) f.e. GetRecordByID, GetRecords, etc.
  4. A client may be allowed to know which output schema (and format etc... is available for that specific format)
    def _get_output_schemas(self, operation):
        _cap_ns = self.capabilities.getroot().nsmap
        _ows_ns = _cap_ns.get('ows')
        if not _ows_ns:
            raise CswError('Bad getcapabilities response: OWS namespace not found '+str(_cap_ns))
        _op=self.capabilities.find("//{{{}}}Operation[@name='{}']".format(_ows_ns,operation))
        _schemas=_op.find("{{{}}}Parameter[@name='outputSchema']".format(_ows_ns))
        _values = map(lambda v: v.text, _schemas.findall("{{{}}}Value".format(_ows_ns)))
        output_schemas={}
        for key, value in _schemas.nsmap.items():
            if value in _values:
                output_schemas.update({key:value})
        return output_schemas