dmlc / dlpack

common in-memory tensor structure
https://dmlc.github.io/dlpack/latest
Apache License 2.0
901 stars 133 forks source link

Specify DLPack helper C APIs #74

Open leofang opened 3 years ago

leofang commented 3 years ago

In the recent discussions scattered everywhere, it appears that some functions should better be just implemented by DLPack so that the downstream libraries do not have to reinvent the wheels. The possibilities include

  1. tensor deleter (see the discussion starting https://github.com/dmlc/dlpack/issues/51#issuecomment-849794854)
  2. query if CPU-accessible (see https://github.com/dmlc/dlpack/pull/71#issuecomment-849784565)
  3. C APIs corresponding to __dlpack__ and __dlpack_device__ in the Python Array API standard for handing streams (#65)
  4. C API that could be exposed as a new Python attribute __dlpack_info__ for returning API and ABI versions (and potentially more, see #34, #72)

cc: @tqchen @rgommers @seberg @eric-wieser @kkraus14 @jakirkham @hameerabbasi @vadimkantorov @oleksandr-pavlyk @szha @veritas9872

leofang commented 3 years ago

cc: @kmaehashi @emcastillo @dalcinl (for vis)

leofang commented 3 years ago

Hi guys, I am bringing some steroid to stimulate progress toward closing this issue!

  1. C API that could be exposed as a new Python attribute __dlpack_info__ for returning API and ABI versions (and potentially more, see Future ABI compatibility  #34, Add ABI version. #72)

The need to retrieve API/ABI versions has been brought up repeatedly in several occasions that I am aware of, including in a recent weekly Array API call. Furthermore, @emcastillo has kindly demonstrated a toy case to show that #71 can be a backward incompatible change:

import torch
import torch.utils.dlpack
import cupy

# Use managed memory
cupy.cuda.set_allocator(cupy.cuda.MemoryPool(cupy.cuda.malloc_managed).malloc)

def get_torch_managed(size):
    return torch.utils.dlpack.from_dlpack(cupy.empty(size).toDlpack())

a = get_torch_managed((3, 2, 1))

The code here uses CuPy to allocate CUDA managed memory and exposes it to PyTorch. The problem is that before #71 landed in DLPack v0.6 managed memory could be disguised as normal device memory by library developers in order to facilitate the exchange. However, if the Producer (CuPy here) supports kDLCUDAManaged from #71 but the Consumer (PyTorch here) does not, the above code snippet breaks during hand-shaking as the Consumer does not recognize it.

tl;dr: We really need to be able to query the supported API/ABI version.

seberg commented 3 years ago

tl;dr: We really need to be able to query the supported API/ABI version.

In the example wouldn't you also need to request the API version (as in "Please give me the struct, with a max version of X.Y")? The exporter (which supports the newer version) must know that the importer is stuck on the older one.

As helpful as signalling the exported version is, it does not replace a Python side (i.e. __dlpack__ spec) for requesting an older version, that would be necessary to do more than raise an informative error that the exported version is too new.

leofang commented 3 years ago

Sorry @seberg I dropped the ball...

as in "Please give me the struct, with a max version of X.Y"

I am not sure if I understand it correctly. I assume you meant if the Consumer only supports v0.5, the Producer (on v0.6) should be able to export a v0.5 struct. But, wouldn't this lead to a huge maintenance burden to keep track of all versions of DLPack as time goes by?

I would incline to have an error raised here, but currently the errors raised in each library for such a scenario is cryptic; as of now the informative error ("the exported version is too new") you said is not possible to generate due to lack of API version-query...

seberg commented 3 years ago

Yeah, that is what I thought. I am not sure that in this case the "usupported (new?) device" error would be worse than "incompatible version", but then you could argue that adding devices could just be a minor version bump anyway.

Even without requesting, an exporter could choose to export an old version for easier compatibility (if it has feature parity for the use-case). I guess requesting a version could tape over some transitions, but not sure that it is important.