geopython / pywps

PyWPS is an implementation of the Web Processing Service standard from the Open Geospatial Consortium. PyWPS is written in Python.
https://pywps.org
MIT License
175 stars 117 forks source link

Optional omitted Complex Input with default format is generated errorneously (?) #633

Open fmigneault opened 2 years ago

fmigneault commented 2 years ago

Description

@cehbrecht @tomkralidis @jachym I would like to better understand the procedure of handling inputs (how they get generated) for the following specific use case.

Given a process that has the following inputs definition :

[...]
<DataInputs>
  <Input minOccurs="0" maxOccurs="100">
    <ows:Identifier>dataset</ows:Identifier>
    <ows:Title>Dataset</ows:Title>
    <ows:Abstract>Enter a URL pointing to a NetCDF file (optional)</ows:Abstract>
    <ComplexData>
      <Default>
        <Format>
          <MimeType>application/x-netcdf</MimeType>
        </Format>
      </Default>
      <Supported>
        <Format>
          <MimeType>application/x-netcdf</MimeType>
        </Format>
      </Supported>
    </ComplexData>
  </Input>
  <Input minOccurs="0" maxOccurs="100">
    <ows:Identifier>dataset_opendap</ows:Identifier>
    <ows:Title>Remote OpenDAP Data URL</ows:Title>
    <ows:Abstract>Or provide a remote OpenDAP data URL, for example: http://my.opendap/thredds/dodsC/path/to/file.nc</ows:Abstract>
    <ows:Metadata xlink:href="https://www.iana.org/assignments/media-types/media-types.xhtml" xlink:title="application/x-ogc-dods" xlink:type="simple"/>
    <LiteralData>
      <ows:DataType ows:reference="urn:ogc:def:dataType:OGC:1.1:string">string</ows:DataType>
      <ows:AnyValue/>
    </LiteralData>
  </Input>
</DataInputs>
[...]

When I submit an execution with only input dataset_opendap provided with some URL string, the _handler(self, request, response) method of the process that ends up being called contains the following request.inputs:

request.inputs = {
  'dataset': [<pywps.inout.inputs.ComplexInput object at 0x7f140d941a10>], 
  'dataset_opendap': deque([<pywps.inout.inputs.LiteralInput object at 0x7f140d956c90>], maxlen=100)
}

My execution XML does not contain dataset, so it gets generated somehow by default following parsing.

<wps100:Execute xmlns:wps100="http://www.opengis.net/wps/1.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" service="WPS" version="1.0.0" xsi:schemaLocation="http://www.opengis.net/wps/1.0.0 http://schemas.opengis.net/wps/1.0.0/wpsExecute_request.xsd">
    <ows110:Identifier xmlns:ows110="http://www.opengis.net/ows/1.1">ncdump</ows110:Identifier>
    <wps100:DataInputs>
        <wps100:Input>
            <ows110:Identifier xmlns:ows110="http://www.opengis.net/ows/1.1">dataset_opendap</ows110:Identifier>
            <wps100:Data>
                <wps100:LiteralData>http://localhost8001/ows/proxy/thredds/dodsC/birdhouse/testdata/ta_Amon_MRI-CGCM3_decadal1980_r1i1p1_199101-200012.nc</wps100:LiteralData>
            </wps100:Data>
        </wps100:Input>
    </wps100:DataInputs>
    <wps100:ResponseForm>
        <wps100:ResponseDocument storeExecuteResponse="true" status="true" lineage="true">
            <wps100:Output asReference="true">
                <ows110:Identifier xmlns:ows110="http://www.opengis.net/ows/1.1">output</ows110:Identifier>
            </wps100:Output>
        </wps100:ResponseDocument>
    </wps100:ResponseForm>
</wps100:Execute>

I'm trying to understand why the dataset input even gets generated in request.inputs following parsing since it is omitted completely from the request. This input is causing me problems, because I need to do some post-processing to convert PyWPS inputs into my package definitions.

Is there some way that I need to employ to detect omitted inputs to discard them explicitly vs real inputs with submitted data? Is there some flag that I would guarantee me that this input is only the default definition and does not contain any actual data?

I cannot rely on data field to detect omitted inputs because it gets filled by the "default format" application/x-netcdf, which could be submitted real data contents:

{"mimeType": "application/x-netcdf", "encoding": null, "schema": null, "maximumMegabytes": null, "default": true}

The only (very hackish/unreliable) field I could use to detect inputs to drop is file which contains a reference to {workdir}/input instead of {workdir}/input_{uuid}. Any better guidance would be greatly appreciated.

Expand this to see full details contents of requests.inputs["dataset"]

{ComplexInput}  
    _data_format = {Format} 
        _encoding = {NoneType} None
        _extension = {NoneType} None
        _mime_type = {str} 'application/x-netcdf'
        _schema = {NoneType} None
        encoding = {str} ''
        extension = {str} ''
        json = {dict} {'mime_type': 'application/x-netcdf', 'encoding': '', 'schema': '', 'extension': ''}
        mime_type = {str} 'application/x-netcdf'
        schema = {str} ''
    _default = {dict} {'mimeType': 'application/x-netcdf', 'encoding': None, 'schema': None, 'maximumMegabytes': None, 'default': True}
    _default_type = {int} 3
    _iohandler = {DataHandler} 
        _data = {dict}  
        _file = {str} '/tmp/weaver-hybrid/pywps_process_pw40isee/input'
        _ref = {weakref} 
        _stream = {NoneType} None
        base64 = {str} 'Traceback (most recent call last):\n  File "/opt/pycharm-pro/plugins/python/helpers/pydev/_pydevd_bundle/pydevd_resolver.py", line 178, in _getPyDictionary\n    attr = getattr(var, n)\n  File "/home/francis/dev/miniconda/envs/weaver-py3/lib/python3.7/site-pa
        data = {dict} {'mimeType': 'application/x-netcdf', 'encoding': None, 'schema': None, 'maximumMegabytes': None, 'default': True}
        file = {str} '/tmp/weaver-hybrid/pywps_process_pw40isee/input'
        mem = {NoneType} None
        post_data = {str} 'Traceback (most recent call last):\n  File "/opt/pycharm-pro/plugins/python/helpers/pydev/_pydevd_bundle/pydevd_resolver.py", line 178, in _getPyDictionary\n    attr = getattr(var, n)\n  File "/home/francis/dev/miniconda/envs/weaver-py3/lib/python3.7/site-pa
        prop = {str} 'data'
        size = {int} 0
        stream = {StringIO} <_io.StringIO object at 0x7f140d8aa690>
        url = {str} 'file:///tmp/weaver-hybrid/pywps_process_pw40isee/input'
    _supported_formats = {tuple} 
    _workdir = {str} '/tmp/weaver-hybrid/pywps_process_pw40isee'
    abstract = {str} 'Enter a URL pointing to a NetCDF file (optional)'
    as_reference = {bool} False
    base64 = {str} 'Traceback (most recent call last):\n  File "/opt/pycharm-pro/plugins/python/helpers/pydev/_pydevd_bundle/pydevd_resolver.py", line 178, in _getPyDictionary\n    attr = getattr(var, n)\n  File "/home/francis/dev/miniconda/envs/weaver-py3/lib/python3.7/site-pa
    data = {dict} {'mimeType': 'application/x-netcdf', 'encoding': None, 'schema': None, 'maximumMegabytes': None, 'default': True}
    data_format = {Format} 
    data_set = {bool} True
    extension = {str} ''
    file = {str} '/tmp/weaver-hybrid/pywps_process_pw40isee/input'
    identifier = {str} 'dataset'
    inpt = {dict} {}
    json = {dict} {'identifier': 'dataset', 'title': 'Dataset', 'abstract': 'Enter a URL pointing to a NetCDF file (optional)', 'keywords': [], 'metadata': [], 'type': 'complex', 'data_format': {'mime_type': 'application/x-netcdf', 'encoding': '', 'schema': '', 'extension': ''}, 'asreference': False, 'supported_formats': [{'mime_type': 'application/x-netcdf', 'encoding': '', 'schema': '', 'extension': ''}], 'workdir': '/tmp/weaver-hybrid/pywps_process_pw40isee', 'mode': 0, 'min_occurs': 0, 'max_occurs': 100, 'translations': None, 'data': "", 'mimetype': 'application/x-netcdf'}
    keywords = {list} []
    max_occurs = {int} 100
    metadata = {list} []
    method = {str} ''
    min_occurs = {int} 0
    post_data = {str} 'Traceback (most recent call last):\n  File "/opt/pycharm-pro/plugins/python/helpers/pydev/_pydevd_bundle/pydevd_resolver.py", line 178, in _getPyDictionary\n    attr = getattr(var, n)\n  File "/home/francis/dev/miniconda/envs/weaver-py3/lib/python3.7/site-pa
    prop = {str} 'data'
    size = {int} 0
    source_type = {int} 3
    stream = {StringIO}  
    supported_formats = {tuple} 
    title = {str} 'Dataset'
    translations = {NoneType} None
    url = {str} 'file:///tmp/weaver-hybrid/pywps_process_pw40isee/input'
    uuid = {NoneType} None
    valid_mode = {int} 0
    workdir = {str} '/tmp/weaver-hybrid/pywps_process_pw40isee'
   

Environment

Steps to Reproduce

Using this process: https://github.com/bird-house/hummingbird/blob/master/hummingbird/processes/wps_ncdump.py

It is executed indirectly by Weaver using this definition: https://github.com/crim-ca/weaver/blob/4.1.0/weaver/processes/wps_package.py#L758

fmigneault commented 2 years ago

More detail... Input dataset gets generated here: https://github.com/geopython/pywps/blob/793ab34bc9aab976243b8a7252e64429c6e65f4f/pywps/app/Service.py#L115-L120

At that point, following values are defined:

request_inputs = None
inpt._default = {'mimeType': 'application/x-netcdf', 'encoding': None, 'schema': None, 'maximumMegabytes': None, 'default': True}
inpt._default_type = SOURCE_TYPE.DATA
inpt.data_set = False

The method parameter wps_request.inputs contains the following:

[
  {
    'identifier': 'dataset_opendap', 
    'data': 'http://localhost8001/ows/proxy/thredds/dodsC/birdhouse/testdata/ta_Amon_MRI-CGCM3_decadal1980_r1i1p1_199101-200012.nc', 
    'uom': '', 
    'datatype': ''
  }
]

It looks like the inpt._set_default_value() should not get called in this case, because it is not a default value that eventually gets set, but a default format definition.