Closed adamreichold closed 2 years ago
we should rename it PyArray::uninit and keep it safe but change its return type to PyArray<MaybeUninit, D>> and add an additional assume_init method for turning this into PyArray<A, D>
Ah, it probably is not that simple. In contrast to ndarray
, we do not drop the elements ourselves but rather numpy
's C code does that. Hence, &PyArray<MaybeUninit<PyObject>, D>
would still lead numpy
to decrement reference counts via those garbage pointers and I fear we actually have to yield MaybeUninit<&PyArray<T, D>>
instead which will make that API significantly less ergonomic...
@davidhewitt I am having some trouble trying to figure out the correct API here. The above does not work as just returning MaybeUninit<&PyArray<T,D>>
does not really change anything (the shared reference is Copy
after all).
I think the main issue is that as soon as FromPyPointer::from_owned_ptr
is called to register the newly created PyArray
instance with the release pool, its reference count will be decremented which will eventually call NumPy's array_dealloc
on the uninitialized data.
So it seems I need some type MaybeOwned<'py, T>
where T: FromPyPointer
that
Python<'py>
token and a raw pointer *mut T
(in this case the return value of PyArray_New
)T
, most likely via *mut T
as MaybeUninit::as_mut_ptr
doesfn assume_init(self) -> &'py T
which will register the pointer with the release pool only after initialization. Does something like this already exist in PyO3? Is there a different idiom I should use to delay registration with the release pool?
(Of course this all means the underlying NumPy array is leaked if the initialization is never completed, but at least this would be safe.)
Or maybe I want to return
MaybeUninit<Py<PyArray>>
to avoid registration?
EDIT: This message has an incorrect timestamp (maybe 1 hour delayed?) not sure what GitHub did here.
This seems like a tricky one... because presumably we do want to free the underlying allocation on drop, so we do need numpy's destructor to run? To clarify, we're in a thorny situation where numpy will free the underlying array elements on drop, so we need to be sure they're valid to drop before calling the numpy destructor. I guess this is only a problem with Drop
structs and also PyObject
arrays. Does filling the array with null pointers help?
I'm not aware of prior art, so we might need to get creative.
Does filling the array with null pointers help?
Yes, NumPy seems to use XDECREF
internally, so null pointers would be safe. On the other hand, filling everything with zeros is exactly the overhead using an uninitialized array is supposed to avoid.
@kngwyu Since we do not have proper API to do mostly safe element-wise initialized either, I would propose to remove the creation of uninitialized PyArray
from the public API. We can still use it internally as an unsafe method and external users can call PyArray::zeros
or use ndarray
's API and then turn the result into a PyArray
. Do think this would be a way going forward? Do you know of users of this crate which depend on create uninitialized PyArray
directly?
Hmm not sure why the github timestamp for my previous comment is in the future? Never seen that before!
Sounds like something like PyArray<MaybeUninit<PyObject>, D>
might still be a viable way to go, with the method still being unsafe
with clear documentation that the user is responsible for initializing the full array else crashes happen?
might still be a viable way to go
The problem is that we do not really have API to enable the user to properly initialize that array, i.e. get access to raw data pointers and such. Additionally, that PyArray
instance is a bomb that we do not offer reasonable help to defuse.
The problem is that we do not really have API to enable the user to properly initialize that array, i.e. get access to raw data pointers and such. Additionally, that PyArray instance is a bomb that we do not offer reasonable help to defuse.
Agreed we would need to add .set_item()
and/or ways to fill in rows from a slice.
Regarding it being a "bomb" - true, but only in the PyObject
case I guess. For inteteger / float types there's no such danger on drop, just danger of uninitialised reads.
Either way, if the method to create it is unsafe
then it still offers users the flexibility to achieve what they want, with explicit acknowledgement they are opting into dangerous territory.
Removing PyArray::new
and making unsafe PyArray::uninit
seems like a decent compromise?
Regarding it being a "bomb" - true, but only in the PyObject case I guess. For inteteger / float types there's no such danger on drop, just danger of uninitialised reads.
I think I found a useful idiom in #216 for keeping the uninitialized array alive, I just do
let ref _ = mem::ManuallyDrop(array.to_object(py));
before initialization to elevate its reference count and do
let _ = mem::ManuallyDrop::into_inner(ref_);
afterwards to release the spurious reference. (I added a bit of convoluted test which does seem to verify that this works as intended.)
Removing PyArray::new and making unsafe PyArray::uninit seems like a decent compromise?
I wonder whether keeping the signature as-is (i.e. leaving out the MaybeUninit
but marking it unsafe
) would be simpler as we might end up recreating a lot of the access API for the MaybeUninit<A>
where A: Element
case?
In any case, I think we should first make sure #216 is in good shape before starting on the public API as it forces us to figure this internally already.
Looking at my test case again, I wonder whether we should just make "Clone
must not panic" part of Element
's contract as in the case of Py<T>
, cloning is just bumping the reference count which should never panic, shouldn't it? This would leave only the iterator-based methods requiring the elevated reference counts as the iterator itself could panic.
(Assuming Copy
for all data types except DataType::Object
should probably also made explicit in the trait documentation.)
(Assuming Copy for all data types except DataType::Object should probably also made explicit in the trait documentation.)
Could also add a const IS_COPYABLE
member to the trait
Could also add a const IS_COPYABLE member to the trait
I don't think DATA_TYPE != DataType::Object
but not trivially copyable makes sense as NumPy itself will not "drop" the elements or "clone" the elements in that case, i.e. it is not us imposing that constraint but already NumPy.
Ah right, I see 👍
As discussed in https://github.com/PyO3/rust-numpy/pull/216#issuecomment-962460983,
PyArray::new
creates an uninitialized array and should therefore be markedunsafe
as it is the callers responsibility to not access (and of non-copy element types not even drop) the array until all elements are initialized.However, we also do not seem to offer a good API for doing that element-wise initialization post facto. I think instead of just marking the method
unsafe
, we should rename itPyArray::uninit
and keep it safe but change its return type toPyArray<MaybeUninit<A>, D>>
and add an additionalassume_init
method for turning this intoPyArray<A, D>
. This would be similar to the approach taken byndarray
.