unclear testing for query references

nbrinckm commented 4 years ago

Description

Hi there, I try to create a service that gives works on a geojson file. The basic idea is to split a city (with all buildings) into equal size parts (regarding the number of buildings) and to have the buildings in the clusters to be close to each other.

(It is basically from a task to create surveys for students to check taxonomies).

I can run the code itself in a process. My main problem is the testing. As my test data set is whole city of Chia, Colombia, I run into troubles regarding to the maximum request size.

For the input data I'm already able to set the config (as it is a global object) in the pws configuration module. For the output, I can ask the pywps server for giving back a reference.

The stuff I'm really in trouble is to query this reference in the testcase:

import os
from pywps import Service, configuration
from pywps.tests import client_for, assert_response_success

import time

from .common import get_output, WPS, OWS, WpsClient
from babybird.processes.wps_split_buildings import SplitBuildings

import geopandas
import requests

# Some of the test code is from here:
# https://github.com/bird-house/emu/blob/master/tests/test_wps_poly_centroid.py

def test_wps_building_splitter():

    current_dir = os.path.dirname(os.path.abspath(__file__))
    data_file = os.path.join(current_dir, 'buildings.json')
    with open(data_file, 'r') as infile:
        data_file_str = infile.read()
    n_parts = 4

    service = Service(processes=[SplitBuildings()])
    print(dir(service))
    client = client_for(service)

    process_identifier = 'splitbuildings'

    configuration.CONFIG.set('server', 'maxrequestsize', '10gb')

    output_element = WPS.Output(
        OWS.Identifier('splittedbuildings'),
    )
    output_element.attrib['asReference'] = 'true'

    response_document_element = WPS.ResponseDocument(
        output_element
    )
    response_document_element.attrib['lineage'] = 'true'
    response_document_element.attrib['status'] = 'true'

    response_form_element = WPS.ResponseForm(response_document_element)

    request_doc = WPS.Execute(
        OWS.Identifier(process_identifier),
        WPS.DataInputs(
            WPS.Input(
                OWS.Identifier('buildings'),
                WPS.Data(WPS.ComplexData(data_file_str))
            ),
            WPS.Input(
                OWS.Identifier('count'),
                WPS.Data(WPS.LiteralData(str(4))) # must be string
            )
        ),
        response_form_element,
        version='1.0.0'
    )

    resp = client.post_xml(doc=request_doc)
    assert_response_success(resp)
    outputs = get_output(resp.xml)
    assert 'splittedbuildings' in outputs.keys()

    url_to_fetch = outputs['splittedbuildings']
    print(url_to_fetch)

    output_data = client.get(url_to_fetch)
    print(output_data)

The service itself is like this:

from pywps import Process, ComplexInput, LiteralInput, LiteralOutput, UOM, ComplexOutput
from pywps.app.Common import Metadata
from pywps import FORMATS

import geopandas

import logging
LOGGER = logging.getLogger("PYWPS")

class SplitBuildings(Process):
    """A process to split buildings in parts."""
    def __init__(self):
        inputs = [
            ComplexInput(
                "buildings", 
                "The buildings to split", 
                abstract="the buildings to split.",
                supported_formats=[
                    FORMATS.JSON,
                ]
            ),
            LiteralInput(
                "count",
                "The count of parts",
                abstract="The count of parts that we want to get.",
                data_type="integer",
            )
        ]
        outputs = [
            ComplexOutput(
                "splittedbuildings",
                "The splitted buildings",
                abstract="The buildings with an area index.",
                supported_formats=[
                    FORMATS.JSON,
                ]
            )
        ]

        super(SplitBuildings, self).__init__(
            self._handler,
            identifier="splitbuildings",
            title="Split the buildings",
            abstract="Split buildings into parts (adding an area index).",
            keywords=['json', 'buildings'],
            metadata=[
                Metadata('PyWPS', 'https://pywps.org/'),
                Metadata('Birdhouse', 'http://bird-house.github.io/'),
                Metadata('PyWPS Demo', 'https://pywps-demo.readthedocs.io/en/latest/'),
            ],
            version='1.0',
            inputs=inputs,
            outputs=outputs,
            store_supported=True,
            status_supported=True
        )

    @staticmethod
    def _handler(request, response):
        geojson_input_file = request.inputs['buildings'][0].file
        n_parts = request.inputs['count'][0].data
        data = geopandas.read_file(geojson_input_file, driver="GeoJSON")

        # some more processing...
        data['areaindex'] = n_parts

        data.to_file('outputfile.geojson', 'GeoJSON')

        response.outputs['splittedbuildings'].file = 'outputfile.geojson'
        return response

When I try to query the url, it doesn't work. (I guess it can be partly because the application may doesn't run on that port; however I haven't seen any documentation on which port it runs then / how to change the url).

As I wrote, I don't know how to really get the result back in the testcase, so that I can check the data after the processing. The testcases I found so far (in the emu repo for example) are all happy with processing literalstrings or with a successful execution of the WPS process, but there was no point in querying the reference urls. Please help me to understand what I have to do here.

Environment

Cookiecutter version: 5351c2fc8649454ec9986a01cd78d0c233c0d1ba
Python version: 3.6.8
Operating System: Ubuntu 18.10

Steps to Reproduce

clone of the cookiecutter-birdhouse repo
created an virtual envirioment & activated it
installed the dependencies from the requirements*.txt files
make bake (and following in the babybird folder)
installed the dependencies there from the requiremts*.txt files
installed geopandas
wrote the two files menioned above
make test

Additional Information

huard commented 4 years ago

Hi Nils,

I think the issue is that the test server is not a file server, so it does serve the output files. However, they should be somewhere on your disk. Note that our test config usually includes

[server]
allowedinputpaths=/

which might help in your case.

Also, you might want to take a look at owslib to make WPS queries. I've recently added support to retrieve files from the local filesystem for exactly this purpose: https://github.com/geopython/OWSLib/issues/680

HTH

nbrinckm commented 4 years ago

So the testserver has no store functionality? :-(

I realized that the file is created then in the main project folder (next to the makefile). I don't think it is good behaviour and I don't like the idea to rely on this. But thank you for your help anyway.

I also tried to run the test against the live server with the owslib:

import os
import unittest
import owslib.wps

INPUT_FILE = os.path.join(
    os.path.dirname(os.path.abspath(__file__)),
    'tests',
    'buildings.json'
)

URL_WPS = 'http://localhost:5000/wps'

# identifier
IDENTIFIER_PROCESS = 'splitbuildings'
IDENTIFIER_INPUT_BUILDINGS = 'buildings'
IDENTIFIER_INPUT_COUNT = 'count'
IDENTIFIER_OUTPUT = 'splittedbuildings'

COLUMN_AREA_INDEX = 'areaindex'

class TestLiveServer(unittest.TestCase):
    def test_building_splitter(self):
        wps = owslib.wps.WebProcessingService(URL_WPS, verbose=True)

        with open(INPUT_FILE, 'r') as infile:
            input_buildings = infile.read()

        execution = wps.execute(IDENTIFIER_PROCESS,
            inputs=[
                (IDENTIFIER_INPUT_BUILDINGS, owslib.wps.ComplexDataInput(
                    value=input_buildings,
                    mimeType='application/json'
                )),
                (IDENTIFIER_INPUT_COUNT, '4'),
            ],
            output=[
                (IDENTIFIER_OUTPUT, True),
            ]
        )

        wps.monitorExecution(execution)

        outfile = gpd.read_file(execution.processOutputs[0].reference)
        self.assertTrue(COLUMN_AREA_INDEX in outfile.columns)

if __name__ == '__main__':
    unittest.main()

(In this case the file is outside of the tests folder, but next to the makefile for the babybird.)

I changed the default.cfg file, so that it allows 10 gb requests sizes (just to be sure).

After installing and running the babybird application, I get 400 status codes for the post requests (so even on the execution part).

huard commented 4 years ago

You should save process results in self.workdir.

nbrinckm commented 4 years ago

Thx, with the workdir I get rid of the json file in the main project folder (that was created before in the tests).

But for making it completely clear for me: At the moment there is no way to get the referenced files by the urls given back from the process within the test case?

nbrinckm commented 4 years ago

Ok test via owslib works after setting the maxrequestsize option.

Still it would be great to handle the reference output in the bird-house-style tests.

huard commented 4 years ago

Agreed. Maybe we could bundle a tiny file server... @cehbrecht Is this something you have considered already?

cehbrecht commented 4 years ago

Agreed. Maybe we could bundle a tiny file server... @cehbrecht Is this something you have considered already?

@huard I have not thought about it. But because we are using werkzeug we can probably easily configure a data file service which gets (optionally) started by the command line: https://github.com/bird-house/emu/blob/5811119f870fab71f8df5a44e725ac11c94864fa/emu/cli.py#L83

But this means we need a running wps ... current pywps tests don't need this.

bird-house / cookiecutter-birdhouse