Open vadimkantorov opened 4 years ago
I guess for proper ref-counting like semantics (so that NumPy doesn't call the deleter too early in presence of other array views) something like weakref would be needed: https://stackoverflow.com/questions/37988849/safer-way-to-expose-a-c-allocated-memory-buffer-using-numpy-ctypes, but not completely sure.
Zero-copy borrowing from numpy is not a difficult issue, it does not have too include weakref or capsule. I have some examples here: https://github.com/dmlc/dlpack/blob/master/apps/from_numpy/main.py.
I think for the case of zero-copy into numpy, if the original array doesn't give up the ownership of the data buffer, we do need to make sure that numpy doesn't release the buffer. I thought this would be something that the OWNDATA flag in numpy arrays already deal with (judging from the name) though I haven't look into the details yet.
Yeah. It shouldn't release the buffer and shouldn't call deleter either if there're some other existing arrays (it should also ideally work when torch.from_numpy is called on such a NumPy array)
A quick heads-up: we prototyped a simple pure python library that allows zero-copy between dlpack-compatible array api and numpy ndarray: https://github.com/jwfromm/numpy_dlpack. The lifetime and ownership are properly taken care of if we didn’t miss out anything.
Do you guys think we should contribute the implementation to this repo?
Thanks for sharing @junrushao1994.
Do you guys think we should contribute the implementation to this repo?
I'm not sure that will be helpful in the long run, or if it's worth spending time reviewing if all the corner cases are correct (from a quick scan of your code, I'd say there'll be a few things it doesn't handle). We just need to finish https://github.com/numpy/numpy/pull/19083, which implements DLPack support in NumPy itself.
Thank you @rgommers! Yeah I believe https://github.com/numpy/numpy/pull/19083 is definitely a nicer way to allow numpy to interact with DLPack natively, and of course in the long run we should go all in with the numpy native approach this PR brings :-)
Alternatively, this repo could potentially be a pure python-based example of exchanging data with any numpy-like arrays using DLPack in a non-intrusive way.
Here is my proposal:
dlpack.py
to python/dlpack/dlpack.py
, so that it could be shared across codebasefrom_numpy.py
and to_numpy.py
to python/dlpack/
so that it could help when numpy's dlpack interface doesn't exist__dlpack__
or from_dlpack
APIs. If so, go with the numpy native APIs instead; Otherwise, fall back to this non-intrusive approachHmm. I now see that this ctypes example is committed! Good news. One difference with my https://github.com/vadimkantorov/pydlpack/blob/master/dlpack.py#L107 is that my array_interface creation from a DLPack included some sort of calling the wrapped dl_managed_tensor.deleter if the numpy array needed to be destroyed. This piece seems missing from to_numpy.py?
Am seeing this dlpack
mention in the NumPy 1.22.0 release notes:
Add NEP 47-compatible dlpack support
Add a
ndarray.__dlpack__()
method which returns adlpack
C structure wrapped in aPyCapsule
. Also add anp._from_dlpack(obj)
function, whereobj
supports__dlpack__()
, and returns anndarray
.(gh-19083)
Given NumPy now supports this, should we close?
I made an experimental wrapper: https://github.com/vadimkantorov/pydlpack/blob/master/dlpack.py#L107
The most difficult part is managing memory / capsules. Currently it's sort of move-semantics (and deallocation is done in C). I'm sure you'd be able to do it better.
It would be a nice illustration in addition to existing borrowing from NumPy
A more complete usecase of mine: https://github.com/vadimkantorov/readaudio