Closed ritchie46 closed 2 years ago
There certainly was no intentional change in the numpy
crate, but I am also not sure which code path is involved here to convert a numpy::PyArray1
into a Vec
automatically. My current best guess is that this is actually the pyo3
crate using that one-dimensional NumPy arrays are Python sequences and that this might be related to https://github.com/PyO3/pyo3/pull/2477?
In any case, could you point out the specific code in question or better post a minimal reproducer of the pattern that worked in 0.16 but fails in 0.17?
I think a workaround would be to accept an enum that derives FromPyObject
and accepts either Vec<T>
or &'py PyArray1<T>
as demonstrated here.
If https://github.com/PyO3/pyo3/pull/2477 is indeed the underlying issue, we might want to relax the Vec<T>: FromPyObject
impl though to avoid the above churn for affected downstream projects.
EDIT: More specifically, the extract_sequence
helper function defined here.
cc @aganders3 @davidhewitt
In any case, could you point out the specific code in question or better post a minimal reproducer of the pattern that worked in 0.16 but fails in 0.17?
Sure, this worked in 0.16
.
#[pyclass]
struct Foo {}
#[pymethods]
impl Foo {
#[staticmethod]
fn from_string_container(val: Vec<String>) -> Foo {
Foo{}
}
}
Yes, this sounds like a PyO3 bug so will transfer it over.
@ritchie46 can you list the types which are having issues with the Vec<T>
conversion which you need to have working? I can make sure to add regression tests and ensure they are all fixed in a 0.17.2 release.
We have a few conversion from python to our types. So I assume it is T: FromPyObject
. Further encountered types were String
types.
Here are a few examples of types that failed. I think the common denominator is the FromPyObject
trait conversion.
#[derive(Clone, Debug)]
#[repr(transparent)]
pub struct ObjectValue {
pub inner: PyObject,
}
impl From<PyObject> for ObjectValue {
fn from(p: PyObject) -> Self {
Self { inner: p }
}
}
impl<'a> FromPyObject<'a> for ObjectValue {
fn extract(ob: &'a PyAny) -> PyResult<Self> {
Python::with_gil(|py| {
Ok(ObjectValue {
inner: ob.to_object(py),
})
})
}
}
impl<'s> FromPyObject<'s> for Wrap<AnyValue<'s>> {
fn extract(ob: &'s PyAny) -> PyResult<Self> {
... hidden implementation
}
}
It sounds like it's the outer (sequence) type that's causing the issue, not the inner (item) type. If so I think it is indeed a victim of the breaking change introduced by #2477. 😕
Unfortunately np.ndarray
is a case where it would pass the old check that used PySequence_Check
but not the new check that requires isinstance(np.ndarray, collections.abc.Sequence) == True
.
If the PySequence_Check
is sufficient for this use perhaps we can replace the downcast in rust-numpy or polars to provide the old behavior with try_from_unchecked
(unsafe, of course).
Sorry if I'm off the mark here but as a fan of both rust-numpy and polars I'm motivated to find a good fix for this and will do my best to help.
If the PySequence_Check is sufficient for this use perhaps we can replace the downcast in rust-numpy or polars to provide the old behavior with try_from_unchecked (unsafe, of course).
I think the code that needs to be relaxed is
EDIT: More specifically, the extract_sequence helper function defined here.
as this - passing np.ndarray
from Python where a Rust extension expected Vec
- used to work with plain PyO3 without involving code in rust-numpy or polars.
I guess we could implement extract_sequence
without the help of PySequence
. Or we could manually call PySequence_Check
and then PySequence::try_from_unchecked
.
I don't want to over-focus on np.ndarray
because I think this bug report was more general, but taking that specific example I think downcasting it to a PySequence
is actually wrong. For example np.ndarray
has no index()
method. That's not a problem specifically for extract_sequence
but it would be for other uses.
To your point though I suppose there could be a more relaxed version of extract_sequence
that would work for this and other use cases - maybe via PyIterator
instead? It looks like that's what sequence::extract_sequence
ultimately requires right now.
@davidhewitt I think it would personally prefer to manually call PySequence_Check
and then use PySequence::try_from_unchecked
as I think missing methods would only yield errors, not unsafety.
Switching the implementation to PyIterator
would cost us the ability to call with_capacity
to preallocate the Vec
which seems an unnecessary loss considering the relevant types here do implement enough of the sequence protocol.
This issue seems to not have been fixed yet. I am getting TypeError: argument 'oned': 'ndarray' object cannot be converted to 'Sequence' running test.py with the following files:
src/lib.rs
use pyo3::prelude::*;
#[pyclass(module = "rust_module")]
#[derive(Default, Debug, serde::Serialize, serde::Deserialize, Clone)]
pub struct DummyClass{
name: String,
oned: [f64;3],
twod: [[f64;3];3],
}
#[pymethods]
impl DummyClass {
#[new]
pub fn pynew(
name: String,
oned: [f64;3],
twod: [[f64;3];3],
) -> PyResult<Self>{
Ok(Self { name, oned, twod})
}
fn __call__(&self) -> PyResult<String>{
Ok("Success".into())
}
}
#[pymodule]
fn pyo3_pure(_py: Python, m: &PyModule) -> PyResult<()> {
m.add_class::<DummyClass>()?;
Ok(())
}
Cargo.toml
[package]
name = "reproduce-extract"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
pyo3 = { version = "0.17.2", features = ["serde", "abi3-py37", "extension-module"] }
numpy = { version = "^0.17.2" }
serde = { version = "^1.0", features = ["derive"]}
[lib]
name = "pyo3_pure"
crate-type = ["cdylib"]
pyproject.toml
[build-system]
requires = ["maturin>=0.13,<0.14"]
build-backend = "maturin"
[tool.maturin]
bindings = "pyo3"
[project]
# The name pyo3_pure instead of pyo3-pure is intentional,
# it's used as a crate name resolution regression test
name = "pyo3_pure"
version = "0.1.0"
dependencies = [
"numpy"
]
test.py
from pyo3_pure import DummyClass
import numpy as np
# test = DummyClass("hello",[0,0,0],[[0,0,0],[0,0,0],[0,0,0]])
test = DummyClass("hello",np.array([0,0,0]),np.array([[0,0,0],[0,0,0],[0,0,0]]))
print(test())
This issue seems to not have been fixed yet. I am getting TypeError: argument 'oned': 'ndarray' object cannot be converted to 'Sequence' running test.py with the following files:
This is a slightly different code path affected by the same underlying issue. The OP reported failures to extract Vec<T>
whereas this is a failure to extract [T; N]
which we failed to consider we authoring the initial fix. Could you open a separate issue here so we can track that properly? Thanks!
Hi all,
First, thank you for working on PyO3 ! 🙌
Currently we are facing a similar problem in https://github.com/pola-rs/polars/pull/6531 when trying to upgrade Py03 to 0.18.0.
We are planning in the mean time to introduce a constructor for this case.
Do you think this is a good approach? Do you have any recommendations?
Thanks!
@jjerphan Py03 0.18 does contain the fix and we do test for this, c.f. https://github.com/PyO3/pyo3/blob/v0.18.0/pytests/tests/test_sequence.py#L28
Can you add more details of what is failing and how?
In updating to pyo3/numpy 0.17 upstream in polars we've hit the case where we have a function that accepts a
Vec<T>
whereT: FromPyObject
to do the conversion from any sequence passed from python.This function is opaque to use, in that anything that can be converted to a
Vec<>
is fine. Such as lists, tuples, but also numpy arrays.Now with updating it seems that this design is not possible anymore. Is this intentional?
What would be the recommended function signature for such generic sequence?