finos / perspective

A data visualization and analytics component, especially well-suited for large and/or streaming datasets.
https://perspective.finos.org/
Apache License 2.0
8.6k stars 1.19k forks source link

`Server` instance not pickle-able, `PerspectiveManager` was pickle-able #2853

Open timkpaine opened 2 weeks ago

timkpaine commented 2 weeks ago

Feature Request

Description of Problem:

Many web server setups use a process pool and send across a pickled instance of the primary web server, and some (such as ray serve) ship an instantiated web server not just to another process, but potentially to a completely different machine.

Table and View have never been pickle-able, nor is there any expectation that they be so, but PerspectiveManager used to be pickle-able, and its replacement, Server is not. Using cloudpickle (but same results for builtin pickle):

Works (perspective<3)

from cloudpickle import dumps
from perspective import PerspectiveManager
dumps(PerspectiveManager())

Doesn't work (perspective>=3)

from cloudpickle import dumps
from perspective import Server
dumps(Server())
Traceback (most recent call last):
  File "/Users/timkpaine/Developer/tmp.py", line 4, in <module>
    dumps(Server())
  File "/opt/homebrew/lib/python3.11/site-packages/cloudpickle/cloudpickle.py", line 1479, in dumps
    cp.dump(obj)
  File "/opt/homebrew/lib/python3.11/site-packages/cloudpickle/cloudpickle.py", line 1245, in dump
    return super().dump(obj)
           ^^^^^^^^^^^^^^^^^
TypeError: cannot pickle 'builtins.PySyncServer' object

Potential Solutions:

The workaround is to defer server construction until e.g. the first request (as we do here), but this is a bit clunky. It would be good if a Server instance with no tables was pickle-able, or if we could establish a better pattern

texodus commented 5 days ago

Discussed offline. This only coincidentally worked in 2.x, there was no guarantee of this behavior, testing to assert it, etc. However, despite the fact that calling this constructor is trivial, I can see how this may be convenient for common Python architectures, especially if you're not overly familiar with what projects like cloudpickle do under-the-hood. There are some unresolved issues about how this would work though, namely what the semantics of cloning a Server should actually be and how to enforce that if its only valid for certain states with minimal maintenance overhead. This would also need to be unique to Server, at least adding pickle capability to e.g. Client is an entirely new challenge.

I've also tested the change locally just to verify, and it does resolve the cloudpickle issue in practice, but I'm unclear as to why. It appears cloudpickle is designed to fallback to the constructor when the class is unpickle-able, which would in theory work fine for this use case, but then fails due to what appears to be pyo3 binding weirdness (?). I would like to understand if this is an actual feature expectation, or a workaround for a quirk in pyo3/cloudpickle integration.