davidhewitt / pythonize

MIT License
207 stars 28 forks source link

Arbitrary Python objects #32

Open adriangb opened 2 years ago

adriangb commented 2 years ago

Maybe this is totally off, but would it even be possible to support arbitrary Python objects? For example, if I had a Python function like this:

class Input(TypedDict):
  type: Literal["http"]
  status_code: int
  method: Literal["GET", "PUT"]
  callback: Callable[[], None]

class Output(TypedDict):
  message: str
  callback: Callable[[], None]

def process(inp: Input) -> Output:
  assert inp["type"] == "http"
  status_code = input["status_code"]
  assert 99 < status_code < 600
  method = input["method"]
  assert method in ("GET", "PUT")
  msg = f'{method} {status_code}"
  return Output(message, inp["callback"])

This crate is super convenient to parse and validate type, status_code and method since you can make a declarative serve model for this:

#[derive(Serialize, Deserialize]
#[serde(rename_all = "UPPERCASE")]
enum Method {
  Get,
  Post,
}

#[derive(Serialize, Deserialize]
struct HTTPMessage {
  status_code: u32,
  method: Method,
}

#[derive(Serialize, Deserialize]
#[serde(tag = "type")]
enum Message {
  Http(HTTPMessage),
}

But as far as I can tell there is no way to say "make the callback key a Py<PyAny>. In other words, something like:

#[derive(Serialize, Deserialize]
struct HTTPMessage {
  status_code: u32,
  method: Method,
  callback: Py<PyAny>,
}

Is that right?

davidhewitt commented 1 year ago

Hello, sorry for the very slow response. You're correct at the moment this isn't possible.

I wonder if Pythonize and Depythonize traits could solve this problem. They could have blanket implementations for T: Serialize (or Deserialize) and then Python types which aren't serde-compatible could be handled separately. Other than that, I'm not aware of a way that we could hook into serde to achieve this.

(This problem is very related to #1 I think.)

apendleton commented 1 year ago

To deal with this in a current project, I've been using a proxy class that can wrap an object and make it look like a dict to depythonize, that looks something like:

class DictProxy(collections.abc.Mapping):
    _inner = None
    _keys = None

    def __init__(self, inner, aliases=None):
        self._aliases = aliases or {}
        self._inner = inner
        self._keys = [k for k in dir(inner) if not k.startswith("_")]
        for alias in self._aliases:
            if alias not in self._keys:
                self._keys.append(alias)

    def __getitem__(self, key):
        if key in self._aliases:
            alias = self._aliases[key]
            if type(alias) is str:
                return getattr(self._inner, self._aliases[key])
            else:
                return alias(self._inner)
        elif key in self._keys:
            return getattr(self._inner, key)
        else:
            raise KeyError

    def __iter__(self):
        yield from self._keys

    def __len__(self):
        return len(self._keys)

    def __contains__(self, key):
        return key in self._keys

    def keys(self):
        return list(self._keys)

I only need to go in the Python -> Rust direction, but I think a similar approach could work in the other direction as well. One semi-major frustration at the moment, though: as currently implemented, the deserializer for mappings calls both .keys() and .values() on the incoming map, and eagerly evaluates the returned iterators, which means every field on the object gets accessed, including @property fields (which get evaluated), even if those fields aren't actually needed in the deserialization. At least in my application, this is resulting in some unnecessary slow/expensive calls, and I haven't yet figured out a way around it.

It doesn't seem like there's any reason it has to work that way, but that's how it works now. I think if object deserialization were to be explicitly supported, some kind of laziness would be important.

jonathan-s commented 1 year ago

Given that pythonize doesn't yet support arbitrary python objects it is not as powerful as pickle which means that it doesn't have the same security concerns as pickle does. If you manage to support arbitrary python objects it's worth considering to leave a function that doesn't support the full set of python objects as that is a feature in of itself in terms of keeping security tight.

Stargateur commented 4 months ago

I'm using dill (or pickle) to serialize any python object with a little code you can do a serde with module and it's work nice.