dmlc / dlpack

common in-memory tensor structure
https://dmlc.github.io/dlpack/latest
Apache License 2.0
905 stars 133 forks source link

[NOTICE] ABI Update For adding Version to DLPack #110

Open tqchen opened 2 years ago

tqchen commented 2 years ago

Dear DLPack community:

After quite a bit of discussions and coordinations, we are planning to do a ABI breaking change to add versioning and read only field to DLPack. DLPack has been a minimum stable ABI for exchanging Array data and we would like for it to continue stay that way.

In the meantime, we would like to have opportunities to carefully evolve DLPack, of course in still carefully considered manner. After long discussions, we have decided to make the following change.

We also propose to change Data API exchange protocol, to allow new versions of DLPack to return capsule with name "vdltensor"(instead of the old "dltensor"), this would .

The change is still ABI compatible, as the new data structure ABI comes with the new class DLManagedTensorVersioned. The data api however would involve an ABI update and update of all DLPack importer/exporters.

Such a move certainly impact a lot of packages and we would like to plan it carefully, as a result, we would like to have at least one month of notice to let everyone chime in, also see if we have enough volunteers to help update the data api exchanges in various packages.

struct DLManagedTensor {
   DLTensor dl_tensor;
   void * manager_ctx;
   void (*deleter)(struct DLManagedTensor * self);
};

/*!
 * \brief The DLPack and DLPack ABI versions of the tensor.
 */
typedef struct {
  /*! \brief DLPack version. */
  uint32_t dlpack;
  /*! \brief DLPack ABI version. */
  uint32_t abi;
} DLPackVersion;

struct DLManagedTensorVersioned {
    DLPackVersion version;
    void * manager_ctx;
    void (*deleter)(struct DLManagedTensorVersioned * self);
    uint64_t flags;
    DLTensor dl_tensor;
}

typedef DLPACK_BIT_MASK_READ_ONLY  1
tqchen commented 2 years ago

cc @leofang @rgommers @oleksandr-pavlyk @tirthasheshpatel @seberg @kkraus14 @wjakob

rgommers commented 2 years ago

Thanks @tqchen, great to see this move forward.

also see if we have enough volunteers to help update the data api exchanges in various packages.

DataAPI change: PyCapsule - rename to vdltensor, so there is proper error messages because of name mismatch

I can open a PR with a proposed update to the spec once this notice period closes, and coordinate getting the implementations updated. IIRC that's a two-step process, first update the from_dlpack side to understand both old and new name, and then update the exporters to switch to the new name some time later. I'll check for the details once we're actually doing this.

leofang commented 2 years ago

I've raised for awareness in today's array API meeting (because I forgot two weeks ago...) and asked for help to spread the words.

tqchen commented 1 year ago

PR that implements the change https://github.com/dmlc/dlpack/pull/113

wjakob commented 1 year ago

It looks great! The issue+PR make the memory layout of the data structures perfectly clear, but it may also be worth addressing the Python bits. For example, tensor types in array frameworks currently implement a __dlpack__ method that returns a capsule with a "dltensor" identifier. If some packages now switch over to "vdltensor", it will introduce breakage that may be considered problematic in a transition to versioned DLpack (For example, suppose that CuPy and PyTorch adopt the new interface with a half-year delay and cannot talk to each other in the meantime -- that would be bad!). My suggestion would be that versioned tensors are returned by a new __vdlpack__ method implemented in addition to the existing __dlpack_. That way, the old interface can be phased out once every tool can deal with the versioned variant.

wjakob commented 1 year ago

My other request is that the PR adds some explanation of what to do when loading a DLManagedTensorVersioned that has a mis-matched DLPack or ABI version (when comparing the versions of the header file available in the tensor provider and tensor consumer). As the spec evolves, this might become a very common situation. What is allowed? What is disallowed?

tqchen commented 1 year ago

Thanks @wjakob , would love to see what others think about __vdlpack__ function, this would primarily be data-api decision.

cc @leofang @rgommers @oleksandr-pavlyk @tirthasheshpatel @seberg @kkraus14

seberg commented 1 year ago

Can we move the discussion about Python somewhere else? I have to think about it a bit more, and I think it would also make sense to think about it a bit more from the C perspective. The reason is what Keith allured to: Currently Python is ahead of C in defining the exchange protocol. For example, it has a protocol around "streams" (I figured out what I don't quite love about it, is that we pass stream without the device). But if we have a clear picture for an exchange protocol in C, that should inform the Python one.

Generally, I am not in favor of a new name, I don't see what it adds compared to renaming the capsule only. But passing version= or even the allocated struct are other things that could be considered.

leofang commented 1 year ago

Agreed with Sebastian, it'd be better if we focus on the C perspective here (we are talking about ABI breaks, after all), and have the Python discussion in a separate issue. I also think the existing name is fine, I thought this was actually discussed somewhere in this repo, I need to look in up.

tqchen commented 1 year ago

Happy to move the python discussion