PyO3 / pyo3

Rust bindings for the Python interpreter
https://pyo3.rs
Apache License 2.0
12k stars 741 forks source link

ENH: boiler plate code for `PyPickle` to support Enums and multithreaded pools #4465

Open attack68 opened 1 month ago

attack68 commented 1 month ago

As context, I rarely use pickle directly in Python, but apparently I have been using it indirectly in the below code:

from multiprocessing import Pool
func = partial(other_func, **kwargs)
p = Pool(NUM_CPUS)
results = p.map(func, some_list)

When my project was pure Python this went undetected. When I started porting some of my classes to Rust with PyO3 all of my tests passed except ones involving the above lines. Why? Becuase my [pyclass] objects were not picklable, apparently.

Thus, for someone not at all au fait with pickling, this lead me on a rather blind hunt. However, I came across a very useful issue on this board, and implemented that code.

Later, my [pyclass] become more complex and fields used to construct them also require enum also labelled as [pyclass]. A simple example which works directly in Rust as MyEnum::Variant1 and Python as MyEnum.Variant1 is:

#[pyclass(module = "mymodule.rs")]
#[derive(Copy, Clone, Debug, Eq, PartialEq, Serialize, Deserialize)]
pub enum MyEnum {
    Variant1,
    Variant2,
    Variant3,
}

My method for pickling broke becuase simple enum do not have, or naturally need, a constructor method. To solve this I have used the following structure:

#[pymethods]
impl MyEnum {
    // Pickling
    #[new]
    fn new_py(variant: u8) -> PyResult<MyEnum> {
        match ad {
            0_u8 => Ok(MyEnum::Variant1),
            1_u8 => Ok(MyEnum::Variant2),
            2_u8 => Ok(MyEnum::Variant3),
            _ => Err(PyValueError::new_err("unreachable code on MyEnum pickle."))
        }
    }
    fn __setstate__(&mut self, state: Bound<'_, PyBytes>) -> PyResult<()> {
        *self = deserialize(state.as_bytes()).unwrap();
        Ok(())
    }
    fn __getstate__<'py>(&self, py: Python<'py>) -> PyResult<Bound<'py, PyBytes>> {
        Ok(PyBytes::new_bound(py, &serialize(&self).unwrap()))
    }
    fn __getnewargs__<'py>(&self) -> PyResult<(u8,)> {
        match self {
            MyEnum::Variant1 => Ok((0_u8,)),
            MyEnum::Variant2 => Ok((1_u8,)),
            MyEnum::Variant3 => Ok((2_u8,)),
        }
    }
}

I don't really want this code in my project. Its complicated (for me) and excessive when applied across all my [pyclass], and it is only needed for that pool multithreading. Issue raised to provoke any discussion if any of this can be abstracted away...

ngoldbaum commented 1 month ago

I think having some kind of support for generating pickle protocol boilerplate makes sense (via a macro?). There are obviously types where that's not possible so pyclass can't always do it, but for most plain old data types I think it's reasonable for users to opt into to PyO3 to doing it automatically.

attack68 commented 4 weeks ago

I think even the smallest amount of auto-implements will be helpful if there is a doc section on what it does and how one can modify/alter it to make it work for their specific case. Currently searching "pickle" in Pyo3 docs gets no results. 'Tis possible I might be able to help out with the docs if there is progress made on the auto-implement.

davidhewitt commented 4 weeks ago

I would definitely be positive on making progress on supporting pickle; it is one of our oldest open issues (#100). As I see it, there are two main challenges associated with it:

In general, I'm more optimistic for __getnewargs__ and __new__. I'm very open to both solutions; mostly this just needs research and hasn't been highest priority item for me yet.