Closed 124C41p closed 9 months ago
I added some smaller comments inline, but the main point I am wonder now that I have look at the pull request is whether the converted case should be a plain
ndarray::Array
or rather a NumPy array as well. This would have the benefit of a more uniform API and would allow the result to be passed to functions expecting a NumPy array. I am not sure if the cost of creating a NumPy array versus anndarray::Array
is prohibitive though.I think spelled out this would look something like
pub struct PyArrayLike<'py, T, D>(PyReadonlyArray<'py, T, D>); impl Deref for PyArrayLike<'py, T, D> { type Target = PyReadonlyArray<'py, T, D>; ... } impl FromPyObject<'py> for PyArrayLike<'py, T, D> { // ... as before, just constructing `PyArray` ... }
What do you think?
I like this idea, and I just published a commit implementing it. There might be one downside however: If the array is created by the np.asarray
fallback, we only get a PyReadonlyArray
, but we should actually own a PyArray
. If the user calls .to_owned_array()
afterwards, we are creating a copy of a copy. Maybe it would be better to define it that way?
pub struct PyArrayLike<T, D>(Cow<PyArray<T, D>>)
If the array is created by the np.asarray fallback, we only get a PyReadonlyArray, but we should actually own a PyArray.
PyReadonlyArray<'py, T, D>
is just a wrapper around a &'py PyArray<T, D>
and derefs into that. A bare reference is as much ownership as Rust code can get for Python types. So the call to to_owned_array
will do the same whether applied to PyReadonlyArray
or &PyArray
.
(Note that even constructors like PyArray::from_vec
return only &PyArray
, not an owned PyArray
in the Rust sense since this type always lives on the Python heap.)
Just out of curiosity: You mentioned
ob.downcast::<PyArray<T, D>>()?.readonly()
being faster thanob.extract::<PyReadonlyArray<T, D>>()
. But is there any downside? Is it less safe? Is there any situation where you would prefer the second over the first?
Functionally they are equivalent. There is one performance paper cut that if you do not use the error returned by ob.extract::<PyReadonlyArray<T, D>>()
, e.g. in
if let Ok(array) = ob.extract<PyReadonlyArray<T, D>>() {
return Ok(Self(array));
}
then this is measurably slower than ob.downcast::<PyArray<T, D>>()?.readonly()
because extract
always returns Result<_, PyErr>
which means the PyDowncastError
returned by downcast
(which extract
uses internally) needs to be wrapped up in a PyErr
, only to be thrown away again.
This is a general PyO3 issue which have not been able to resolve thus far. (FromPyObject
will probably need an associated type for the error, but combining multiple FromPyObject
implementations into new ones still needs to be possible. No rocket science, but a lot of code churn over the whole ecosystem to make it happen.)
You can also see rust-numpy
trying to work around this here: https://github.com/PyO3/rust-numpy/blob/3843fa94c5b9e0b035a0b1cb8a70c164ef153a21/src/array.rs#L135-L137
The review request was a misclick, sorry.
@124C41p Thank you for this PR, and @adamreichold thank you for your encouraging tutoring.
I have one question about the design: is this really need to be struct? I mean, if we don't need C: Coerce
information after extracting, maybe it's better to just have a function, say, extract_arraylike(obj, allow_typecast)
.
@kngwyu To my understanding having a struct is necessary for conveniently using it inside the header of a pyfunction. Also, since the struct has only two fields, one being a phantom type, it should get optimized away by the compiler anyway. So extracting a PyArrayLike
should not be more expensive (in the fast path) than extracting a PyReadonlyArray
.
I see, you're right that we definitely need them to use as function arguments.
BTW, I checked asarray and _array_fromobject_generic implementations, finding that it just tries some faster paths (e.g., obj
is already an array) and then calls PyArray_CheckFromAny_int, which is an internal version of PyArray_CheckFromAny. So, I think that using PyArray_CheckFromAny
is better because it is an exposed C-API, if we can handle faster paths correctly.
So now the problem is whether we can handle faster paths simply enough. The code was a bit complex to read for me, but it looks like only handles the cases where the given object is already a numpy array or an instance of an array subclass. Maybe it's worth considering to use it?
BTW, I checked asarray and _array_fromobject_generic implementations, finding that it just tries some faster paths (e.g.,
obj
is already an array) and then calls PyArray_CheckFromAny_int, which is an internal version of PyArray_CheckFromAny. So, I think that usingPyArray_CheckFromAny
is better because it is an exposed C-API, if we can handle faster paths correctly. So now the problem is whether we can handle faster paths simply enough. The code was a bit complex to read for me, but it looks like only handles the cases where the given object is already a numpy array or an instance of an array subclass. Maybe it's worth considering to use it?
I agree in principle and also started looking into which FFI code paths this ends up eventually. But as you say, it is somewhat convoluted and not at all obvious how to call PyArray_CheckFromAny
. Therefore, I would suggest finishing this using the asarray
fallback and then working on the FFI integration as an optimization follow-up. (Basically, this needs docs and examples to ready for review, but I have not had the time to write that yet.)
I agree, but let me confirm one thing...
not at all obvious how to call PyArray_CheckFromAny
We already have it, right? Do you mean that it's not obvious how to use it?
We already have it, right? Do you mean that it's not obvious how to use it?
Yes, I do not refer to the signature, just how to use it correctly. Which code paths to handle ourselves. Which parameters to pass and so on.
@kngwyu I added docs and examples and think this is good to go now. Could you have another look? Thanks!
Summary
Extracts a read only reference if the correct numpy array type is given. Tries to convert the input into the correct type otherwise.
Resolves #382
Disclaimer
This implementation seems to work as intended and covers my personal use case. However, I am not sure how well it performs (you certainly want it to be not significantly slower than calling
np.asarray(...)
on Python side, and then extractingPyReadonlyArray<...>
on Rust side). Also, there are several tasks remaining like documentation and proper error handling.Do you want to take over at this point for finalizing the branch in your own style?