bird-house / flyingpigeon

WPS processes for climate model data, indices and extreme events
http://flyingpigeon.readthedocs.io/en/latest/
Apache License 2.0
19 stars 15 forks source link

Wrong app initialization in wsgi.py #240

Closed cehbrecht closed 6 years ago

cehbrecht commented 6 years ago

This bug is reported by CRIM:

I recently started working on the PAVICS project, and I have a bug to submit you, but I’m not sure where is the best place/project to submit it.

We have a common bug with the WPS services built on pywps, such as Malleefowl, Flyingpigeon and Hummingbird.

On each WPS request we perform on those services, an additional file descriptor gets opened on the services’ log file, for example, inside the Flyingpigeon Docker container, it’s the file /opt/birdhouse/var/log/pywps/flyingpigeon.log.

Consequently, all log output gets duplicated by the number of times the gunicorn worker had previously processed requests… filling up our hard drives very rapidly.

My quick workaround is to add “--max-requests 1” to the gunicorn command line, so the workers are restarted after each requests.

But after longer investigation, I found out that the root cause of the problem is in wsgi.py of those services.

For example, Flyingpigeon is currently written like this:

import os
from pywps.app.Service import Service

from .processes import processes

def application(environ, start_response):
    app = create_app()
    return app(environ, start_response)

def create_app(cfgfiles=None):
    config_files = [os.path.join(os.path.dirname(__file__), 'default.cfg')]
    if cfgfiles:
        config_files.extend(cfgfiles)
    if 'PYWPS_CFG' in os.environ:
        config_files.append(os.environ['PYWPS_CFG'])
    service = Service(processes=processes, cfgfiles=config_files)
    return service

Unfortunately, create_app() is called on every request handled by the worker. Rewriting the wsgi.pylike this fixes the problem:

import os
from pywps.app.Service import Service

from .processes import processes

def create_app(cfgfiles=None):
    config_files = [os.path.join(os.path.dirname(__file__), 'default.cfg')]
    if cfgfiles:
        config_files.extend(cfgfiles)
    if 'PYWPS_CFG' in os.environ:
        config_files.append(os.environ['PYWPS_CFG'])
    service = Service(processes=processes, cfgfiles=config_files)
    return service

application = create_app()

Thanks!

cehbrecht commented 6 years ago

This bug needs to be fixed in all WPS birds. We can keep the main bug report in flyingpigeon.

cehbrecht commented 6 years ago

Would be nice to have a common wsgi.py app in pywps which can dynamically load the configured processes. See https://github.com/geopython/pywps/issues/118

cehbrecht commented 6 years ago

@ldperron I have fixed this now in the birds used by PAVICS. Other birds will follow. Still there is a duplication of log entries ... maybe due to the number of gunicorn workers. You can change in addition the log-level maybe just to WARN to reduce the log files. The pywps log files have no auto-rotation ... needs to be added on the unix system.

You can merge or cherry-pick the changes.

ldperron commented 6 years ago

Thanks! We will stay with the workaround for now, but I will make sure to remove the workaround when we upgrade any of the birds.

ldperron commented 6 years ago

Finally, the workaround using --max-requests 1 was a very bad idea. It totally breaks async execution of WPS processes.