bird-house / birdy

Birdy provides a command-line tool to work with Web Processing Services.
http://birdy.readthedocs.io/en/latest/
Apache License 2.0
8 stars 3 forks source link

Sending files to WPS service #214

Closed sehHeiden closed 1 year ago

sehHeiden commented 1 year ago

Description

I want send some files to a local wps services. I use PyWPS. For me it seams like the first file is interpreted as url.

Environment

Steps to Reproduce

The process is configured in PyWPS with:

class LeastCostPath(Process):
    def __init__(self):
        inputs = [ComplexInput('costs', 'Cost Raster', supported_formats=[Format('image/tiff'), ]),
                  ComplexInput('start', 'Starting Point',
                               supported_formats=[Format('application/gpkg'), Format('application/json'), ]),
                  ComplexInput('end', 'Ending Point',
                               supported_formats=[Format('application/gpkg'), Format('application/json'), ])]
        outputs = [ComplexOutput('out', 'Referenced Output',
                                 supported_formats=[
                                     Format('application/json')
                                 ])]

        super(LeastCostPath, self).__init__(
            self._handler,
            identifier='lcp',
            title='Process least cost path',
            abstract='Returns a GeoJSON \
                with with least cost path from cost raster.',
            inputs=inputs,
            outputs=outputs,
            store_supported=True,
            status_supported=True
        )

    def _handler(self, request, response):
        input_cost_raster = request.inputs['layer'][0].file
        input_start = request.inputs['startingPoint'][0].file
        input_end_points = request.inputs['endingPoint'][0].file

        lcp = find_least_cost_path(input_cost_raster, 0, False, input_start, input_end_points)

        response.outputs['out'].output_format = Format(FORMATS['JSON'])
        response.outputs['out'].data = lcp.to_json(indent=2)
        return response

I want to use a raster file (Tiff) and two points (gpkg, or geojson) as vector file, to estimate the least cost path.

from birdy import WPSClient
from pathlib import Path

pywps = WPSClient('http://localhost:5000/wps')

cost_raster = Path(r".\..\..\results\weights\result_res_100_all_touched_True.tif")
start_features = Path(r"..\..\results\test_points\start_point.gpkg")
end_features = Path(r"..\..\results\test_points\end_point.gpkg")

pywps.lcp(costs=cost_raster,
          start=start_features,
          end=end_features).get(asobj=True)[0]

Additional Information

The traceback is:

Traceback (most recent call last):
  File "...\lib\site-packages\IPython\core\interactiveshell.py", line 3433, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-19d0a6f355d4>", line 1, in <module>
    runfile(...\\src\\wps\\birdy_test.py', wdir='...\\src\\wps')
  File "...\pydev_umd.py", line 198, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "...\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "...\src\wps\birdy_test.py", line 13, in <module>
    pywps.lcp(costs=cost_raster,
  File "<...\lib\site-packages\birdy\client\base.py-3>", line 5, in lcp
  File "...\lib\site-packages\birdy\client\base.py", line 368, in _execute
    wps_response = self._wps.execute(
  File "...\lib\site-packages\owslib\wps.py", line 361, in execute
    response = execution.submitRequest(request)
  File "...\lib\site-packages\owslib\wps.py", line 933, in submitRequest
    response = reader.readFromUrl(
  File "...\lib\site-packages\owslib\wps.py", line 604, in readFromUrl
    return self._readFromUrl(url, data, self.timeout, method, username=username, password=password,
  File "...\lib\site-packages\owslib\wps.py", line 515, in _readFromUrl
    u = openURL(url, data, method='Post',
  File "...\lib\site-packages\owslib\util.py", line 211, in openURL
    raise ServiceException(req.text)
owslib.util.ServiceException: <?xml version="1.0" encoding="UTF-8"?>
<!-- PyWPS 4.5.2 -->
<ows:ExceptionReport xmlns:ows="http://www.opengis.net/ows/1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/ows/1.1 http://schemas.opengis.net/ows/1.1.0/owsExceptionReport.xsd" version="1.0.0">
  <ows:Exception exceptionCode="FileURLNotSupported" locator="" >
      <ows:ExceptionText>File URL not supported as input.</ows:ExceptionText>
  </ows:Exception>
</ows:ExceptionReport>
huard commented 1 year ago

Did you configure the server to support input paths ?

[server]
allowedinputpaths=/
sehHeiden commented 1 year ago

No, I did not have a config file added. But I changed it now.

Now the flask server starts with:

service = pywps.Service([Buffer(), MyBuffer(), Centroids(), LeastCostPath()], ['pywps.cfg', ])
app = flask.Flask(__name__)
app.route('/wps', methods=['GET', 'POST'])(lambda: service)

if __name__ == '__main__':
    app.run()

Full file can be found here I did not know, what to set so I set several alternatives:

allowedinputpaths=/tmp:/var/lib/pywps/downloads:/var/lib/pywps/public:.:/

Full file can be found here. Adding the allowedinputpaths did not change the error. Probaly because it's still happening on the OWSlib site. The data sent to the local pywps server is: b'<wps100:Execute xmlns:wps100="http://www.opengis.net/wps/1.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" service="WPS" version="1.0.0" xsi:schemaLocation="http://www.opengis.net/wps/1.0.0 http://schemas.opengis.net/wps/1.0.0/wpsExecute_request.xsd"><ows110:Identifier xmlns:ows110="http://www.opengis.net/ows/1.1">lcp</ows110:Identifier><wps100:DataInputs><wps100:Input><ows110:Identifier xmlns:ows110="http://www.opengis.net/ows/1.1">costs</ows110:Identifier><wps100:Reference xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="file:///../../results/weights/result_res_100_all_touched_True.tif" mimeType="image/tiff"/></wps100:Input><wps100:Input><ows110:Identifier xmlns:ows110="http://www.opengis.net/ows/1.1">start</ows110:Identifier><wps100:Reference xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="file:///../../results/test_points/start_point.gpkg" mimeType="application/gpkg"/></wps100:Input><wps100:Input><ows110:Identifier xmlns:ows110="http://www.opengis.net/ows/1.1">end</ows110:Identifier><wps100:Reference xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="file:///../../results/test_points/end_point.gpkg" mimeType="application/gpkg"/></wps100:Input></wps100:DataInputs><wps100:ResponseForm><wps100:ResponseDocument storeExecuteResponse="false" status="false" lineage="false"><wps100:Output asReference="true"><ows110:Identifier xmlns:ows110="http://www.opengis.net/ows/1.1">out</ows110:Identifier></wps100:Output></wps100:ResponseDocument></wps100:ResponseForm></wps100:Execute>'

I tested used bird to test the wps with this.

What does opengis.net/wps to do with my local pywps server? Thanks!

huard commented 1 year ago

Just use / as the allowed path, at least for testing.

Another potential problem is that your links are relative, e.g. xlink:href="file:///../../results/test_points/end_point.gpkg" For the server, this makes no sense as the server does not know in which directory the request is performed.

sehHeiden commented 1 year ago

Okay. I changed the the config allowedinputpaths. I changed some input for birdy, so that I can create absolute paths: The xlink now looks like:

xlink:href="file:///home/username/path_to_file/file_name.tif"

Error message still is:

<ows:ExceptionReport xmlns:ows="http://www.opengis.net/ows/1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/ows/1.1 http://schemas.opengis.net/ows/1.1.0/owsExceptionReport.xsd" version="1.0.0">
  <ows:Exception exceptionCode="FileURLNotSupported" locator="" >
      <ows:ExceptionText>File URL not supported as input.</ows:ExceptionText>
  </ows:Exception>
</ows:ExceptionReport>

Edit: Is there a maximum length for the links? Edit 2: Is there anything that the python requirements that I need to install? I tried to figure out what part of the log message is needed here:

2022-12-21 21:39:30,919 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2022-12-21 21:39:30,920 INFO sqlalchemy.engine.Engine INSERT INTO pywps_requests (uuid, pid, operation, version, time_start, time_end, identifier, message, percent_done, status) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
2022-12-21 21:39:30,920 INFO sqlalchemy.engine.Engine [cached since 0.07398s ago] ('91ffdb4e-816f-11ed-b07e-c8e2659bfcc6', 1966, 'execute', '1.0.0', '2022-12-21 21:39:30.918956', None, 'lcp', None, None, None)
2022-12-21 21:39:30,920 INFO sqlalchemy.engine.Engine COMMIT
2022-12-21 21:39:30,923 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2022-12-21 21:39:30,924 INFO sqlalchemy.engine.Engine SELECT count(*) AS count_1 
FROM (SELECT pywps_requests.uuid AS pywps_requests_uuid, pywps_requests.pid AS pywps_requests_pid, pywps_requests.operation AS pywps_requests_operation, pywps_requests.version AS pywps_requests_version, pywps_requests.time_start AS pywps_requests_time_start, pywps_requests.time_end AS pywps_requests_time_end, pywps_requests.identifier AS pywps_requests_identifier, pywps_requests.message AS pywps_requests_message, pywps_requests.percent_done AS pywps_requests_percent_done, pywps_requests.status AS pywps_requests_status 
FROM pywps_requests 
WHERE pywps_requests.uuid = ?) AS anon_1
2022-12-21 21:39:30,924 INFO sqlalchemy.engine.Engine [cached since 0.07388s ago] ('91ffdb4e-816f-11ed-b07e-c8e2659bfcc6',)
2022-12-21 21:39:30,925 INFO sqlalchemy.engine.Engine SELECT pywps_requests.uuid AS pywps_requests_uuid, pywps_requests.pid AS pywps_requests_pid, pywps_requests.operation AS pywps_requests_operation, pywps_requests.version AS pywps_requests_version, pywps_requests.time_start AS pywps_requests_time_start, pywps_requests.time_end AS pywps_requests_time_end, pywps_requests.identifier AS pywps_requests_identifier, pywps_requests.message AS pywps_requests_message, pywps_requests.percent_done AS pywps_requests_percent_done, pywps_requests.status AS pywps_requests_status 
FROM pywps_requests 
WHERE pywps_requests.uuid = ?
2022-12-21 21:39:30,926 INFO sqlalchemy.engine.Engine [cached since 0.0741s ago] ('91ffdb4e-816f-11ed-b07e-c8e2659bfcc6',)
2022-12-21 21:39:30,927 INFO sqlalchemy.engine.Engine UPDATE pywps_requests SET time_end=?, message=?, percent_done=?, status=? WHERE pywps_requests.uuid = ?
2022-12-21 21:39:30,927 INFO sqlalchemy.engine.Engine [cached since 0.07404s ago] ('2022-12-21 21:39:30.926544', 'Request rejected due to exception', 100.0, 5, '91ffdb4e-816f-11ed-b07e-c8e2659bfcc6')
2022-12-21 21:39:30,927 INFO sqlalchemy.engine.Engine COMMIT
127.0.0.1 - - [21/Dec/2022 21:39:30] "GET /wps?service=WPS&request=DescribeProcess&version=1.0.0&identifier=all HTTP/1.1" 200 -
127.0.0.1 - - [21/Dec/2022 21:39:30] "POST /wps HTTP/1.1" 400 -
huard commented 1 year ago

I'm not sure about the limit, but I'm quite certain your link would not exceed it. I doubt the issue is due to a missing dependency. I'm confident it's a simple thing.

Could you post your input files so I can run a demo on my end ?

huard commented 1 year ago

In

        input_cost_raster = request.inputs['layer'][0].file
        input_start = request.inputs['startingPoint'][0].file
        input_end_points = request.inputs['endingPoint'][0].file

the input keys don't match the inputs definitions, you should have instead

        input_cost_raster = request.inputs['costs'][0].file
        input_start = request.inputs['start'][0].file
        input_end_points = request.inputs['end'][0].file

Also, remove line response.outputs['out'].output_format = Format(FORMATS['JSON'])

huard commented 1 year ago

What version of owslib do you have ? With the changes above, I've been able to run your example by plugging your process definition in the emu pywps server (commenting out the actual least cost function call).

huard commented 1 year ago

I can reproduce your issue using flask to launch the server instead of wsgi.

huard commented 1 year ago

Been able to fix it by changing your code to service = pywps.Service([LeastCostPath()], ['/home/david/src/emu/tests/test.cfg', ]) where test.cfg is

[server]
allowedinputpaths=/
language = en-US,fr-CA,de-DE

[logging]
level=DEBUG
sehHeiden commented 1 year ago

Thanks for showing be the bugs for the input files. That helped me to find some more (Did ope the files so far). Especially the new config file helped me getting further. To add the line: language = en-US,fr-CA,de-DE was not enough, but when I removed all other lines, that you did not include in your file thinks began to work out like magic.

There is a single problem left. You asked me to remove the the line from the server: response.outputs['out'].output_format = Format(FORMATS['JSON']) That puzzles me, because I did copy it from here. But the current problem is the next line: response.outputs['out'].data = lcp.to_json(indent=2) I only rewrote it slightly. I do not write the json string with dumps as in the example, but with the method to_json. The problem is how to deal with it on the client side:

result = pywps.lcp(costs=cost_raster,
                   start=start_features,
                   end=end_features)
print(result.get(asobj=True)[0]) 

result.get(asobj=True) throws the error: TypeError: a bytes-like object is required, not 'str' Therefore, I returned a bytes-object on the server side, but that returns the very same error. Turns out: The line out.write(content) in the owslib.wps Method Output.writeToDisk throws this error and content is of type str. I data want is on the client site, I is opened/read but because writeToDisk fails, I cannot get further.

As alternative I tried metalink.download with: download.get(result.get(asobj=False), path='.', segmented=False) But that also brakes, because the referenced file result.get(asobj=False) does not have a suffix. Hence, I am wondering, whether the method I used to return the data is wrong or outdated, I whether owslib does something strange here. The version of owslib I installed is 0.27.2.

huard commented 1 year ago

The pywps-flask repo is inactive, I would not rely on it too much.

I don't know what lcp.to_json does exactly, so it's hard for me to tell what the problem is. You could take a look at wps_pandas.py in emu/processes for a process returning a json output.

huard commented 1 year ago

I can reproduce your problem with your flask app, but it works fine with the wsgi app. Is you flask app able to serve static files ?

huard commented 1 year ago

What I mean is that your flask app has no route for outputs.

sehHeiden commented 1 year ago

My background is more with the processing of geo data. It's the first time I try to create any web service.

Three things:

1) I often heard wsgi in the discussion. I wanted to try it out with the documentation in the PyWPS Documentation.

I did not succeed with apache2, because sudo a2enmod wsgi throws that a2enmod is not found. For gunicorn: Instead of gunicorn3 I took gunicorn.

Because I didn't wanna try pywps-flask. I used my own wsgi file.

I wanted to use gunicorn with: gunicorn -b 127.0.0.1:8081 --workers $((2*`nproc --all`)) --log-syslog --pythonpath ./ wsgi.pywps_app:application throws the ModuleNotFoundError: No module named 'wsgi' ls -lha returns e.g.:

-rw-r--r-- 1 user group  152 Dez 22 18:51 pywps.wsgi
lrwxrwxrwx 1 user group   12 Dez 22 19:40 pywps_app.py -> ./pywps.wsgi

The documentation here is somewhat old. Do you have better tip. Which wsgi to choose and how to config it?

2) as alternative I tried to add the download route in flask with:

@app.route('/outputs/'+'<path:filename>')
def outputfile(filename):
    target_file = join('outputs', filename)
    if isfile(target_file):
        file_ext = splitext(target_file)[1]
        with open(target_file, mode='rb') as f:
            file_bytes = f.read()
        mime_type = None
        if 'json' in file_ext:
            mime_type = 'text/json'
        return flask.Response(file_bytes, content_type=mime_type)
    else:
        flask.abort(404)

Which I took from pywps-flask for a missing better example.

This does NOT change the error message.

The server side locks looks okay:

2022-12-22 19:58:16,102 INFO sqlalchemy.engine.Engine [cached since 78.58s ago] ('2022-12-22 19:58:16.101652', 'PyWPS Process Process least cost path finished', 4, '6890a89a-822a-11ed-b328-c8e2659bfcc6')
2022-12-22 19:58:16,102 INFO sqlalchemy.engine.Engine COMMIT
2022-12-22 19:58:16,104 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2022-12-22 19:58:16,105 INFO sqlalchemy.engine.Engine SELECT pywps_stored_requests.uuid AS pywps_stored_requests_uuid, pywps_stored_requests.request AS pywps_stored_requests_request 
FROM pywps_stored_requests
 LIMIT ? OFFSET ?
2022-12-22 19:58:16,105 INFO sqlalchemy.engine.Engine [generated in 0.00022s] (1, 0)
127.0.0.1 - - [22/Dec/2022 19:58:16] "POST /wps HTTP/1.1" 200 -

3) Why do I need the outputs route, when I return not a file but a string? In cause I still need it? Why having a file name?

sehHeiden commented 1 year ago

I currently try wsgi with gunicon. I had to uninstall the version from the distro and go with the version from conda (and log out and in again). Originally gunicorn would not know packages from my conda env.

I use this wsgi config:

from pywps import Service

from src.wps.processes.least_cost_path import LeastCostPath

application = Service([LeastCostPath()], ['./pywps.cfg', ])

and I all gunicorn with: gunicorn -b 127.0.0.1:8081 -workers=2 --log--syslog --pyhonpath ./../.. pywps_app:application

On the client site I get the error with birdy:

FileNotFoundError: [Errno 2] No such file or directory: '/tmp/e2d6fb5e-8248-11ed-8859-c8e2659bfcc6/input' thrown by retrieveData in owslib.wps.py.

when calling result.get(asobj=True).

The most likely error I did so far is the nginx config:

I understood the config description so that I have to create a file /etc/nginx/sites_available/default` to

server {
     listen 80 default_server;
     listen [::]:80 default_server;
     server_name _;

     #better to redirect / to wps application
     location / {
     return 301 /wps;
     }

     location /wps {
             # with try_files active there will be problems
             #try_files $uri $uri/ =404;

             proxy_set_header Host $host;
             proxy_redirect          off;
             proxy_set_header        X-NginX-Proxy true;
             proxy_set_header X-Real-IP $remote_addr;
             proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
             proxy_pass http://127.0.0.1:8081;
             }

}
huard commented 1 year ago

I'm not a server guy myself, so not sure I can help you. I use the birdhouse cookie-cutter to create new servers. Everything is ready to go.

sehHeiden commented 1 year ago

I have one and a half solutions so far.

flask works with metalink download.get. and wsgi and flusk work amost with the config

[server]
allowedinputpaths=/
outputpath=src/wps/tmp
language=en-US,fr-CA,De-DE

[logging]
level=DEBUG

The result is saved under ./src/wps/tmp/.../input.json , but I am getting an error: that '/tmp/.../input.json' does not exists. Which is odd?

sehHeiden commented 1 year ago

Okay the following setting works with flask only:

[server]
allowedinputpaths = /
outputpath = /tmp
url = http://localhost:5000/wps
outputurl = http://localhost:5000/tmp
language = en-US,fr-CA,De-DE

[logging]
level=DEBUG

I updated (os.path -> pathlib) and fixed the download route:

@app.route('/tmp/'+'<path:filename>')
def outputfile(filename: str):
    print(filename)
    target_file = Path('/tmp') / filename
    if target_file.is_file():
        file_ext = target_file.suffix
        with open(target_file, mode='rb') as f:
            file_bytes = f.read()
        mime_type = None
        if 'json' in file_ext:
            mime_type = 'text/json'
        return flask.Response(file_bytes, content_type=mime_type)
    else:
        flask.abort(404)

The result is good enough for me, but it would be nice to know, why the same config did not work with gunicorn. The only difference in my setting are the ports. I use 5000 fo flask and 8081 for gunicorn. My privat summary is, that the error messsages almost newer showed the real problem. Which were within the config, download route.