geopython / pywps

PyWPS is an implementation of the Web Processing Service standard from the Open Geospatial Consortium. PyWPS is written in Python.
https://pywps.org
MIT License
176 stars 117 forks source link

Make sure PyWPS objects are serializable #658

Open huard opened 2 years ago

huard commented 2 years ago

Description

Parallelisation libraries, like dask, communicate processes from the scheduler to workers by serializing-deserializing objects through the network. It seems that some PyWPS objects are not serializable. The issues I've found so far are:

I propose to start by writing tests that try to pickle PyWPS objects, submit a PR, and pursue the discussion over there.

Environment

Steps to Reproduce

Additional Information

huard commented 2 years ago

While investigating this, I realized that the Process.json returns dict, while WPSRequest.json returns a string. the former has a from_json method, while in the second, json is a property with getter and setter methods.

Is this something that should be uniform across the code?

huard commented 2 years ago

Another more serious issue is that Process._run_process, the method actually running the process handler, triggers Process.launch_next_process, which runs Service.prepare_process_for_execution. So individual processes need a reference to the overall service, which complicates the serialization of Processes. I'm not sure I can solve this one without falling into a refactoring nightmare. Ideas ?

gschwind commented 1 year ago

Hello huard,

While investigating this, I realized that the Process.json returns dict, while WPSRequest.json returns a string. the former has a from_json method, while in the second, json is a property with getter and setter methods.

Is this something that should be uniform across the code?

I also noticed the different behavior of json properties across the code and I did addressed the issue in some of my refactoring such as [1].

I think this should be fixed.

[1] https://github.com/gschwind/PyWPS/commit/db2738732e25787fd02f6e921671752a93d7c866

gschwind commented 1 year ago

Hello huard,

Another more serious issue is that Process._run_process, the method actually running the process handler, triggers Process.launch_next_process, which runs Service.prepare_process_for_execution. So individual processes need a reference to the overall service, which complicates the serialization of Processes. I'm not sure I can solve this one without falling into a refactoring nightmare. Ideas ?

I do also agree that is quite an issue, but refactoring this is very difficult at the moment.

Best regard.

gschwind commented 1 year ago

Hello,

Moreover the json serialization is used in different context with very different meaning, the serialization may end up as json outputs for json request, may be used within XML templates or may be used to serialize data to the data base.

We should clarify the intend of json serialization and try to keep it.