davidhewitt / pythonize

MIT License
198 stars 28 forks source link

Add depythonize_on_error #38

Open LilyFoote opened 1 year ago

LilyFoote commented 1 year ago

I'm opening this PR more as a place to start a discussion for my use-case and the changes I needed to make to support it. I'm hoping it's possible to turn this into a feature that's more broadly useful to other users.

My use case is I'm building a profiler/debugger called Kolo (https://kolo.app/). As part of this we have a Python library which inspects Python frame objects and serializes information about them (such as local variables) as json. In this I make use of a custom JSONEncoder to serialize otherwise unserializable data (as their repr).

I have been reimplementing this library in Rust for performance, and json encoding is a measurable slowdown in my profiling, so I'm trying to use pythonize and serde_json to avoid calling Python's json.dumps.

With the changes in this branch, and the following implementation of a dump_json function my tests all pass:

use pyo3::intern;
use pyo3::prelude::*;
use pythonize::depythonize_on_error;
use serde_json::Value;

fn on_error(data: &PyAny) -> &PyAny {
    let py = data.py();
    match data.repr() {
        Ok(repr) => repr,
        Err(_) => intern!(py, "SerializationError"),
    }
}

pub fn dump_json(data: &PyAny) -> Result<Value, PyErr> {
    let data: Value = depythonize_on_error(data, &Some(on_error))?;
    Ok(data)
}

Is a feature allowing some fallback behaviour on errors desirable for this library? Or is there a better way to achieve what I'm trying to do?

davidhewitt commented 1 year ago

Sorry for having just seen this, been a long week for me. Thanks for opening the PR! The idea is the on_error callback gives a substitution which can then be deserialized instead?

I wonder if we should call this default like in json.dumps?

I'll need to have a play with the implementation to understand if it's the right approach. Will try to do this in the next few days.

LilyFoote commented 1 year ago

Sorry for having just seen this, been a long week for me.

No worries!

The idea is the on_error callback gives a substitution which can then be deserialized instead?

Pretty much - in my use case I use the repr of the thing that failed to serialize, plus a further fallback if the repr itself is broken.

I wonder if we should call this default like in json.dumps?

I considered that. I'm not sure if that's going to be helpful or confusing, since serde has its own concept of "default", which is different. (https://serde.rs/container-attrs.html#default)

I'll need to have a play with the implementation to understand if it's the right approach. Will try to do this in the next few days.

Awesome!