geopython / pywps

PyWPS is an implementation of the Web Processing Service standard from the Open Geospatial Consortium. PyWPS is written in Python.
https://pywps.org
MIT License
178 stars 117 forks source link

Validators cause unwanted fetch of file URL #526

Open fmigneault opened 4 years ago

fmigneault commented 4 years ago

Description

When the WPS execution occurs and that an input reference URL (for example remote JSON for below references), the format validator will actually pull the file because file property refers to UrlHandler.file, which in turn does the request and write the chucks locally.

https://github.com/geopython/pywps/blob/d05483d75e753b3cda303f5c0bb778a0f9465393/pywps/validator/complexvalidator.py#L126

https://github.com/geopython/pywps/blob/d05483d75e753b3cda303f5c0bb778a0f9465393/pywps/validator/complexvalidator.py#L151

In UrlHandler.file : https://github.com/geopython/pywps/blob/d05483d75e753b3cda303f5c0bb778a0f9465393/pywps/inout/basic.py#L461-L467

This might be ok for usual data processing execution because most processes want the file to be generated locally at some point, but it shouldn't be done during validation when MODE < STRICT as it is not required for checking the extension from the name (which is what MODE.SIMPLE attempts to do). The behavior is valid when MODE >= STRICT because the full contents are validated, but it is not required for simple mime-type checks.

My general use case is that I need to reference https://<somewhere> files and pass them down to further remote processes. Therefore, I want to validate that the file type is correct, but not fetch them right away (the child process will do so). As the file could be quite big, fetching it 2 times (parent/child process) for basic validation is not great. Since the file is not required locally until I explicitly call file property, the validator shouldn't do so if it doesn't need it.

Environment

Steps to Reproduce

Additional Information

Part of requirements for developing OGC EMS which dispatches execution to remote ADES. https://github.com/crim-ca/weaver