PyO3 / rust-numpy

PyO3-based Rust bindings of the NumPy C-API
BSD 2-Clause "Simplified" License
1.07k stars 98 forks source link

PyReadonlyArray does not become readonly numpy array #408

Closed jungerm2 closed 5 months ago

jungerm2 commented 6 months ago

I have a pyclass that contains an Option<Array2>, and I would like to be able to get/set it from python. The macros #[pyo3(get, set)] do not work for this so I ended up writing a getter/setter like so:

    #[getter]
    pub fn mask<'py>(&'py self, py: Python<'py>) -> Result<Option<&PyArray2<bool>>> {
        let arr = self.mask.as_ref().map(|a| a.clone().into_pyarray(py));
        Ok(arr)
    }

    #[setter(mask)]
    pub fn mask_setter(&mut self, arr:  Option<&PyArray2<bool>>) -> Result<()> {
        self.mask = arr.map(|a| unsafe {a.as_array()}.to_owned());
        Ok(())
    }

Which works fine, but there's a subtle bug. If a user does something that would usually call __setitem__, such as:

obj.maks[0, 0] = True

Then this assignment silently fails because obj.mask return a copy of the data. As I understand it, it's not possible to return a mutable array as both to_pyarray and into_pyarray consume/clone the data. Please correct me if this is wrong. Thus, the next best thing is to return a readonly array so that it'll throw an error if the user does the above.

However, returning a PyReadonlyArray get's converted to a normal numpy array and the user can still erroneously set data. Here's my workaround, which calls numpy's setflags such that the returned array is not writable.

    #[getter(mask)]
    pub fn mask_getter<'py>(&'py self, py: Python<'py>) -> Result<Option<Py<PyAny>>> {
        self.mask.as_ref().map(|a| -> Result<Py<PyAny>> {
            let py_arr = a.to_pyarray(py).to_owned().into_py(py);
            py_arr.getattr(py, "setflags")?.call1(py, (false, None::<bool>, None::<bool>))?;
            Ok(py_arr)
        }).transpose()
    }

This works fine, but seems like it should be a part of the library? Like if I write to_pyarray().readonly() the resulting array in python should be readonly too?

adamreichold commented 6 months ago

Then this assignment silently fails because obj.mask return a copy of the data. As I understand it, it's not possible to return a mutable array as both to_pyarray and into_pyarray consume/clone the data. Please correct me if this is wrong. Thus, the next best thing is to return a readonly array so that it'll throw an error if the user does the above.

If you want to mutably share arrays with Python code, I think you would need to store Py<PyArray2<bool>> in your #[pyclass] so that #[get] can just return a Py<PyArray2<bool>> by cloning, i.e. by incrementing the reference count. Of course, this implies that when you want to have Rust code modify this array, you need derive an ArrayView into the NumPy array.

This works fine, but seems like it should be a part of the library? Like if I write to_pyarray().readonly() the resulting array in python should be readonly too?

I think this is a slightly unhappy naming clash, PyReadonlyArray and PyReadwriteArray are short-lived wrapper types which implement our Rust-side borrow checking to avoid Rust code creating unsound references into the interior of NumPy arrays, c.f. https://docs.rs/numpy/latest/numpy/borrow/index.html. They are not intended to modify the underlying NumPy array, but rather to avoid the unsafe block you use in your setter by providing a safe path towards as_array.

jungerm2 commented 6 months ago

Thanks for the quick reply, and for clearing up this confusion.

I'll consider embedding a Py<PyArray2<bool>> in my class instead, but I'm not yet sure what consequences that will imply. I intend to be able to instantiate this struct from within rust, without any python dependencies, as part of a package. So maybe this is a non starter for me...

As far as the unsafe code, is there a way to avoid that here? I just noticed there's a to_owned_array method, but it's doing exactly like above. I'm not sure I understand why PyArray2 -> ArrayBase would need to ever be unsafe, they both hold size info, so presumably there's bound checks and all?

adamreichold commented 6 months ago

I intend to be able to instantiate this struct from within rust, without any python dependencies, as part of a package. So maybe this is a non starter for me...

If it is a #[pyclass] unconditionally, you will always depend on at least libpython3.so. Otherwise, i.e. if the type can be used without the #[pyclass] annotation, this would tie you to depending on Python with a working NumPy installation. You might need to have a separate layer for deep integration with Python with is built from the pure Rust data structure.

As far as the unsafe code, is there a way to avoid that here? I just noticed there's a to_owned_array method, but it's doing exactly like above. I'm not sure I understand why PyArray2 -> ArrayBase would need to ever be unsafe, they both hold size info, so presumably there's bound checks and all?

There is more to memory safety than bounds checking. Most importantly, since ArrayView basically hands out shared and mutable references to the array elements, the usual Rust rules apply, i.e. many shared references but not mutable one or a single mutable reference. We have dynamic borrow checking which ensures that e.g. you only have one live ArrayViewMut in your extension for every NumPy array.

In this particular case, to_owned_array might do the same, but it can be safe because the short-lived references it used never escape the function so that there can be no aliasing issues, i.e. I would suggest to use it instead of the incantation.

jungerm2 commented 5 months ago

It looks like I'll need to give my project layout some more thought...

I'll go ahead and close this issue as the original readonly question was answered. Thanks again!