PyO3 / pyo3

Rust bindings for the Python interpreter
https://pyo3.rs
Apache License 2.0
12.32k stars 760 forks source link

`val` ndarray object cannot be converted to `Sequence` #2615

Closed ritchie46 closed 2 years ago

ritchie46 commented 2 years ago

In updating to pyo3/numpy 0.17 upstream in polars we've hit the case where we have a function that accepts a Vec<T> where T: FromPyObject to do the conversion from any sequence passed from python.

This function is opaque to use, in that anything that can be converted to a Vec<> is fine. Such as lists, tuples, but also numpy arrays.

Now with updating it seems that this design is not possible anymore. Is this intentional?

What would be the recommended function signature for such generic sequence?

adamreichold commented 2 years ago

There certainly was no intentional change in the numpy crate, but I am also not sure which code path is involved here to convert a numpy::PyArray1 into a Vec automatically. My current best guess is that this is actually the pyo3 crate using that one-dimensional NumPy arrays are Python sequences and that this might be related to https://github.com/PyO3/pyo3/pull/2477?

In any case, could you point out the specific code in question or better post a minimal reproducer of the pattern that worked in 0.16 but fails in 0.17?

adamreichold commented 2 years ago

I think a workaround would be to accept an enum that derives FromPyObject and accepts either Vec<T> or &'py PyArray1<T> as demonstrated here.

If https://github.com/PyO3/pyo3/pull/2477 is indeed the underlying issue, we might want to relax the Vec<T>: FromPyObject impl though to avoid the above churn for affected downstream projects.

EDIT: More specifically, the extract_sequence helper function defined here.

adamreichold commented 2 years ago

cc @aganders3 @davidhewitt

ritchie46 commented 2 years ago

In any case, could you point out the specific code in question or better post a minimal reproducer of the pattern that worked in 0.16 but fails in 0.17?

Sure, this worked in 0.16.

#[pyclass]
struct Foo {}

#[pymethods]
impl Foo {

    #[staticmethod]
    fn from_string_container(val: Vec<String>) -> Foo {
        Foo{}
    }
}
davidhewitt commented 2 years ago

Yes, this sounds like a PyO3 bug so will transfer it over.

@ritchie46 can you list the types which are having issues with the Vec<T> conversion which you need to have working? I can make sure to add regression tests and ensure they are all fixed in a 0.17.2 release.

ritchie46 commented 2 years ago

We have a few conversion from python to our types. So I assume it is T: FromPyObject. Further encountered types were String types.

Here are a few examples of types that failed. I think the common denominator is the FromPyObject trait conversion.

ObjectValue (an opaque python object)

#[derive(Clone, Debug)]
#[repr(transparent)]
pub struct ObjectValue {
    pub inner: PyObject,
}

impl From<PyObject> for ObjectValue {
    fn from(p: PyObject) -> Self {
        Self { inner: p }
    }
}

impl<'a> FromPyObject<'a> for ObjectValue {
    fn extract(ob: &'a PyAny) -> PyResult<Self> {
        Python::with_gil(|py| {
            Ok(ObjectValue {
                inner: ob.to_object(py),
            })
        })
    }
}

AnyValue (An enum over our allowed values )

impl<'s> FromPyObject<'s> for Wrap<AnyValue<'s>> {
    fn extract(ob: &'s PyAny) -> PyResult<Self> {
      ... hidden implementation
    }
}
aganders3 commented 2 years ago

It sounds like it's the outer (sequence) type that's causing the issue, not the inner (item) type. If so I think it is indeed a victim of the breaking change introduced by #2477. 😕

Unfortunately np.ndarray is a case where it would pass the old check that used PySequence_Check but not the new check that requires isinstance(np.ndarray, collections.abc.Sequence) == True.

If the PySequence_Check is sufficient for this use perhaps we can replace the downcast in rust-numpy or polars to provide the old behavior with try_from_unchecked (unsafe, of course).

Sorry if I'm off the mark here but as a fan of both rust-numpy and polars I'm motivated to find a good fix for this and will do my best to help.

adamreichold commented 2 years ago

If the PySequence_Check is sufficient for this use perhaps we can replace the downcast in rust-numpy or polars to provide the old behavior with try_from_unchecked (unsafe, of course).

I think the code that needs to be relaxed is

EDIT: More specifically, the extract_sequence helper function defined here.

as this - passing np.ndarray from Python where a Rust extension expected Vec - used to work with plain PyO3 without involving code in rust-numpy or polars.

I guess we could implement extract_sequence without the help of PySequence. Or we could manually call PySequence_Check and then PySequence::try_from_unchecked.

aganders3 commented 2 years ago

I don't want to over-focus on np.ndarray because I think this bug report was more general, but taking that specific example I think downcasting it to a PySequence is actually wrong. For example np.ndarray has no index() method. That's not a problem specifically for extract_sequence but it would be for other uses.

To your point though I suppose there could be a more relaxed version of extract_sequence that would work for this and other use cases - maybe via PyIterator instead? It looks like that's what sequence::extract_sequence ultimately requires right now.

adamreichold commented 2 years ago

@davidhewitt I think it would personally prefer to manually call PySequence_Check and then use PySequence::try_from_unchecked as I think missing methods would only yield errors, not unsafety.

Switching the implementation to PyIterator would cost us the ability to call with_capacity to preallocate the Vec which seems an unnecessary loss considering the relevant types here do implement enough of the sequence protocol.

Billyangnz commented 2 years ago

This issue seems to not have been fixed yet. I am getting TypeError: argument 'oned': 'ndarray' object cannot be converted to 'Sequence' running test.py with the following files:

src/lib.rs

use pyo3::prelude::*;

#[pyclass(module = "rust_module")]
#[derive(Default, Debug, serde::Serialize, serde::Deserialize, Clone)]
pub struct DummyClass{
    name: String,
    oned: [f64;3],
    twod: [[f64;3];3],
}

#[pymethods]
impl DummyClass {
    #[new]
    pub fn pynew(
        name: String,
        oned: [f64;3],
        twod: [[f64;3];3],
    ) -> PyResult<Self>{
        Ok(Self { name, oned, twod})
    }

    fn __call__(&self) -> PyResult<String>{
        Ok("Success".into())
    }
}

#[pymodule]
fn pyo3_pure(_py: Python, m: &PyModule) -> PyResult<()> {
    m.add_class::<DummyClass>()?;
    Ok(())
}

Cargo.toml

[package]
name = "reproduce-extract"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
pyo3 = { version = "0.17.2", features = ["serde", "abi3-py37", "extension-module"] }
numpy = { version = "^0.17.2" }
serde = { version = "^1.0", features = ["derive"]}

[lib]
name = "pyo3_pure"
crate-type = ["cdylib"]

pyproject.toml

[build-system]
requires = ["maturin>=0.13,<0.14"]
build-backend = "maturin"

[tool.maturin]
bindings = "pyo3"

[project]
# The name pyo3_pure instead of pyo3-pure is intentional,
# it's used as a crate name resolution regression test
name = "pyo3_pure"
version = "0.1.0"

dependencies = [
    "numpy"
]

test.py

from pyo3_pure import DummyClass
import numpy as np

# test = DummyClass("hello",[0,0,0],[[0,0,0],[0,0,0],[0,0,0]])
test = DummyClass("hello",np.array([0,0,0]),np.array([[0,0,0],[0,0,0],[0,0,0]]))

print(test())
adamreichold commented 2 years ago

This issue seems to not have been fixed yet. I am getting TypeError: argument 'oned': 'ndarray' object cannot be converted to 'Sequence' running test.py with the following files:

This is a slightly different code path affected by the same underlying issue. The OP reported failures to extract Vec<T> whereas this is a failure to extract [T; N] which we failed to consider we authoring the initial fix. Could you open a separate issue here so we can track that properly? Thanks!

jjerphan commented 1 year ago

Hi all,

First, thank you for working on PyO3 ! 🙌

Currently we are facing a similar problem in https://github.com/pola-rs/polars/pull/6531 when trying to upgrade Py03 to 0.18.0.

We are planning in the mean time to introduce a constructor for this case.

Do you think this is a good approach? Do you have any recommendations?

Thanks!

adamreichold commented 1 year ago

@jjerphan Py03 0.18 does contain the fix and we do test for this, c.f. https://github.com/PyO3/pyo3/blob/v0.18.0/pytests/tests/test_sequence.py#L28

Can you add more details of what is failing and how?