dmlc / dlpack

common in-memory tensor structure
https://dmlc.github.io/dlpack/latest
Apache License 2.0
901 stars 133 forks source link

Import DLPack tensors directly into NumPy (without going via PyTorch or TF) #55

Open vadimkantorov opened 4 years ago

vadimkantorov commented 4 years ago

I made an experimental wrapper: https://github.com/vadimkantorov/pydlpack/blob/master/dlpack.py#L107

The most difficult part is managing memory / capsules. Currently it's sort of move-semantics (and deallocation is done in C). I'm sure you'd be able to do it better.

It would be a nice illustration in addition to existing borrowing from NumPy

A more complete usecase of mine: https://github.com/vadimkantorov/readaudio

vadimkantorov commented 4 years ago

I guess for proper ref-counting like semantics (so that NumPy doesn't call the deleter too early in presence of other array views) something like weakref would be needed: https://stackoverflow.com/questions/37988849/safer-way-to-expose-a-c-allocated-memory-buffer-using-numpy-ctypes, but not completely sure.

junrushao commented 4 years ago

Zero-copy borrowing from numpy is not a difficult issue, it does not have too include weakref or capsule. I have some examples here: https://github.com/dmlc/dlpack/blob/master/apps/from_numpy/main.py.

szha commented 4 years ago

I think for the case of zero-copy into numpy, if the original array doesn't give up the ownership of the data buffer, we do need to make sure that numpy doesn't release the buffer. I thought this would be something that the OWNDATA flag in numpy arrays already deal with (judging from the name) though I haven't look into the details yet.

vadimkantorov commented 4 years ago

Yeah. It shouldn't release the buffer and shouldn't call deleter either if there're some other existing arrays (it should also ideally work when torch.from_numpy is called on such a NumPy array)

junrushao commented 3 years ago

A quick heads-up: we prototyped a simple pure python library that allows zero-copy between dlpack-compatible array api and numpy ndarray: https://github.com/jwfromm/numpy_dlpack. The lifetime and ownership are properly taken care of if we didn’t miss out anything.

Do you guys think we should contribute the implementation to this repo?

rgommers commented 3 years ago

Thanks for sharing @junrushao1994.

Do you guys think we should contribute the implementation to this repo?

I'm not sure that will be helpful in the long run, or if it's worth spending time reviewing if all the corner cases are correct (from a quick scan of your code, I'd say there'll be a few things it doesn't handle). We just need to finish https://github.com/numpy/numpy/pull/19083, which implements DLPack support in NumPy itself.

junrushao commented 3 years ago

Thank you @rgommers! Yeah I believe https://github.com/numpy/numpy/pull/19083 is definitely a nicer way to allow numpy to interact with DLPack natively, and of course in the long run we should go all in with the numpy native approach this PR brings :-)

Alternatively, this repo could potentially be a pure python-based example of exchanging data with any numpy-like arrays using DLPack in a non-intrusive way.

Here is my proposal:

vadimkantorov commented 1 year ago

Hmm. I now see that this ctypes example is committed! Good news. One difference with my https://github.com/vadimkantorov/pydlpack/blob/master/dlpack.py#L107 is that my array_interface creation from a DLPack included some sort of calling the wrapped dl_managed_tensor.deleter if the numpy array needed to be destroyed. This piece seems missing from to_numpy.py?

jakirkham commented 1 year ago

Am seeing this dlpack mention in the NumPy 1.22.0 release notes:

Add NEP 47-compatible dlpack support

Add a ndarray.__dlpack__() method which returns a dlpack C structure wrapped in a PyCapsule. Also add a np._from_dlpack(obj) function, where obj supports __dlpack__(), and returns an ndarray.

(gh-19083)

Given NumPy now supports this, should we close?