geopython / pywps

PyWPS is an implementation of the Web Processing Service standard from the Open Geospatial Consortium. PyWPS is written in Python.
https://pywps.org
MIT License
175 stars 117 forks source link

Async processes accumulate until failure with sqlite in-memory #495

Open danwild opened 4 years ago

danwild commented 4 years ago

Description

This relates to running processes in async mode, with logging.database=sqlite:///:memory:

When an async process completes, i.e. (pywps_requests.percent_done: 100.0, pywps_requests.status: 4) the running_count reported by dblog never seems to reflects this.

So if I set parallelprocesses=5 I can execute 5 successful jobs, however each job increments this running count, which is never decremented on completion, meaning I can only run 5 before all I get is ‘PyWPS Process GetSubset accepted’ response for a process which never runs.

This issue only seems to happen when using in-memory sqlite (i.e. does not occur when supplying my own sqlite db string).

Environment

Steps to Reproduce

Additional Information

This is being run in a docker container from macos host, don’t think it should affect this 🤔

elemoine commented 4 years ago

See https://github.com/geopython/pywps/issues/245 as well.

danwild commented 4 years ago

Thanks @elemoine, I missed that one! So I guess I'm confirming that this definitely is still an issue.

If the fix is to avoid in-memory sqlite:

As a side note, IMO feel like logging.database is a bit misleading when this is not really specific to logging, probably should be server.database. My 2c anyway.

cehbrecht commented 4 years ago

@huard @davidcaron I suppose the watchdog daemon in PR #497 will not work with sqlite memory database. Can two services connect to the same memory db? So, the watchdog might not solve this issue.

davidcaron commented 4 years ago

@huard @davidcaron I suppose the watchdog daemon in PR #497 will not work with sqlite memory database. Can two services connect to the same memory db? So, the watchdog might not solve this issue.

No, we will still have the same issue that is explained in #245. As far as I know, there is no way for the 2 in-memory databases (the main one, and the forked one) to be synchronized.

Also, I agree with @danwild:

If the fix is to avoid in-memory sqlite:

* it should not be the default setting

* should be noted in the docs somewhere

As a side note, IMO feel like logging.database is a bit misleading when this is not really specific to logging, probably should be server.database. My 2c anyway.

jachym commented 4 years ago

You are right, it was originally used just for logging, but now it became more a service database.

the watchdog pull request should hopefully address some of the issues (?)

ruester commented 4 years ago

I can also confirm this behaviour with pywps version 4.2.6. If the waiting queue gets full, the processing of the requests stops and the queue will stay full forever. I was able to fix it with configuring database=sqlite:///temp.db for [logging]. Issue https://github.com/geopython/pywps/issues/245 seems to be not fixed yet...

gschwind commented 2 years ago

Hello,

I do not know how mode=memory and cache=shared may affect this issue [1].

In other hand, I would just state in the documentation that this mode is not supported and we recommend to use a file backed by tmpfs.

Another way to handle it is to have a standalone wps daemon that hold the database for all processes, but it's look a much complicated solution to implement. I can imagine a daemon with apache as proxy and this daemon will handle sub-process properly. That way we get rid of wsgi or equivalent and run the daemon in pure python.

Best regards.

[1] https://sqlite.org/inmemorydb.html