[DISCUSS][RFC] DLPack Versioning and ABI Update

tqchen commented 2 years ago

This is a thread to bring awareness and discuss the recent set of proposed change of adding versioning information to DLPack.

Summary

Up until now the DLPack C struct itself does not carry ABI version information, and we always maintain backward compatibility. Up until now there is no ABI breaking changes.

The overall stability is one reason why frameworks adopt DLPack in the first place and we would like to continue to do moving forward. In the meantime, it is indeed helpful to clarify the DLPack ABI version in the data structure itself. contains a draft proposal for the change. We can still

In short, we are looking into the possibility of attaching version information to the DLTensor struct, to minimize the ABI change, the version information is appended in the end of the current DLTensor

Summary of key changes

C struct: append version struct in the end of current DLTensor, we can still discuss the specific impl of version struct. There is also a possibility of readonly flag/mask in light of the change
Python API: introduce __dlpack_info__ function that returns the current supported API/ABI version.

Compatibility implications

One thing that is worth clarifying is that whether this constitute a ABI breaking change. It depends on how people use it. Let us clarify the following scenarios.

S0: Frameworks that consume the old DLPack ABI should be able to continue consume the new DLTensor and DLManagedTensor by ignoring the fields
- Specifically, the memory layout of the existing fields remains the same, so old access code should work out of the box
- The deleter signature remains the same, which means call into the deleter of DLManagerTensor(where deleter is defined by producer that produces the new DLTensor) should safely delete the new structure.
S1: Framework that produces old DLPack ABI Tensor can be consumed by the new framework, if the new framework queries the existence of __dlpack_info__ then do the conversion according
- Note the catch is that if a new framework was handled an arbitrary (old) DLTensor, querying the ABI field will result in an undefined behavior. So the guard on the python side exporter is needed.
S2: Interestingly, the old framework can include thew new DLTensor/DLManagedTensor struct, not updating any of its exchange implementations. From consumer pov, they can receive and use DLTensor as if it is an old struct.
- The deleter will ensure proper deletion.
- It is fine as a result for framework to "soft update" to include the new header, but not updating __dlpack_info__ to indicate that it is already at a new version

Normally S0 means that the data structure ABI is still backward compatible (new data structure being used in old scenarios). S1 somewhat seats in the future-compatible regime(old data structure being used in new scenarios).

Related threads

previous discussion https://github.com/dmlc/dlpack/issues/34
https://github.com/dmlc/dlpack/pull/101 PR draft

What to Expect

This is a notice and discussion thread to let the community that this is happening. It would be great to get folks to also tag related framework communities, so the ecosystem is fully aware of(and ideally buy-in) the proposed change before we formally proceed.

Given this change does have compatibility implications (although not breaking depending on how we see it), we will have a longer period of NOTICE(expect 1-2 months) before we proceed. Please tag as many folks you think are related as possible.

In the future, we also suggest to open thread of this kind to discuss the compatibility implication of the changes, if any.

tqchen commented 2 years ago

cc @rgommers @wookey @tirthasheshpatel @junrushao1994 @jermainewang @leofang @kkraus14 @szha @hawkinsp

areusch commented 2 years ago

what should downstream frameworks expect in the way of migration functions? should folks proposing an ABI-breaking change here also be on the hook to provide an implementation that produces a new DLTensor struct given an old one? it could be helpful to provide such implementation to clarify how to interpret such older structs.

tqchen commented 2 years ago

Thanks @areusch this is indeed something that worth covering. The compatibility implication section covers some of that

The frameworks are "safe" to simply use the new struct without changing the API due to S0(backward compatibility)
There should be new API update to introduce __dlpack__info__ at which point the new fields needs to be filled properly.

It is a bit ambiguous to call it ABI-breaking(due to S0) but would be good to list out all the implications

areusch commented 2 years ago

one concern with placing the version information at the end of the struct is that, in a V1 DLTensor, that field is invalid. therefore, the memory in that location could be set to any value. that begs the question whether we should require some magic, and how that magic should perhaps be computed.

i think adding the migration functions would help folks to consider these corner cases, which are not always obvious when writing these specs.

tqchen commented 2 years ago

I believe what you said is covered by S1 part, which I agree that some of the migration function(likely filling in the value) would be useful

areusch commented 2 years ago

i'm not sure i follow--how can a C impl trust this memory, since it's outside the boundaries of a DLTensor in v1 of this spec? even if __dlpack__info__ exists on the Python wrapper class, how could the impl know whether or not the DLTensor instance was produced by a V2+ framework (and therefore, trust the memory which holds the version info)?

mattip commented 2 years ago

The protocol is covered in the PR to implement this, as documented here (summarized below). As I understand the proposal, if the producer has no __dlpack_info__, then by definition it is using V1: this is S0 above. If the consumer calls __dlpack__() with no version, then the producer should return a V1 C structure for backward compatibility, at least for the next few years: this is S1 above. Otherwise, as described in the protocol, the consumer requests a specific dlpack version, which then dictates the C structure returned by the producer.

PR protocol documentation

> The array API will offer the following syntax for data interchange: > > A `from_dlpack(x)` function, which accepts (array) objects with a `__dlpack__` method and uses that method to construct a new array containing the data from `x`. > `__dlpack__(self, stream: int = None, version: int = None)`, `__dlpack_info__(self)`, and `__dlpack_device__(self)` methods on the array object, which will be called from within `from_dlpack`, to access the data, to get the maximum supported DLPack version, and to query what device the array is on (may be needed to pass in the correct stream, e.g. in the case of multiple GPUs). > Semantics > DLPack describes the memory layout of strided, n-dimensional arrays. When a user calls `y = from_dlpack(x)`, the library implementing `x` (the "producer") will provide access to the data from `x` to the library containing `from_dlpack` (the "consumer"). ... > A DLPack version can be requested by passing the `version` keyword. The consumer should call the `__dlpack_info__` method to get the maximum DLPack version supported by the producer and request for a version both support e.g. `min(producer_version, consumer_version)`. If the consumer doesn't support any version below the producer's maximum version, a BufferError should be raised. Similarly, If the producer doesn't support the requested version, it should raise a BufferError.

areusch commented 2 years ago

@mattip thanks for the pointer. those protocol docs make sense when Python is in the loop.

I'm thinking of the following scenario--it could be something that doesn't apply to dlpack, so please correct me if I'm wrong:

A producer (C++) is compiled against dlpack created prior to this proposal landing
A consumer (C++) is compiled against dlpack created after this proposal lands
They are communicating by pointer which erases the type (e.g. PackedFunc in TVM-land passes DLTensor* as void*)

Now suppose the consumer is trying to work with the DLTensor created by the producer. Presumably this involves a flow in which a DLTensor* is cast into DLTensorVersioned* (e.g. via DLTensor* (from an old library not compiled with an up-to-date dlpack) -> void* (suppose this is done in order to pass via a PackedFunc in TVM-land) -> DLTensorVersioned*).

When the consumer (or the consumer's wrapping Python DLTensor class) tries to access the received DLTensorVersioned*, it would try to make a version check against version.dlpack, but the producer allocated a smaller struct (sizeof(DLTensor)), that memory is invalid.version.dlpack could be random.

A way around this could be to require that all DLTensor passed via C++ be allocated by the consumer, and the consumer could then zero-initialize the DLTensorVersioned and fill in the requested version. However, then that drags in some complexity around requiring that sizeof(DLTensorVersioned) only grows as version increases, so there are some drawbacks there too.

wjakob commented 2 years ago

Independent of the discussion here, my suggestion would be to let this RFC gather feedback for 1-2 months before making changes of the code. An ABI break for such an important data exchange protocol is serious. Having enough feedback will ensure that there are no regrets.

tqchen commented 2 years ago

@wjakob thanks for the suggestion! That is indeed the goal of this RFC(to ensure a long enough period of time before we commit as we value stability), I made it clear in the post

mattip commented 2 years ago

They are communicating by pointer which erases the type (e.g. PackedFunc in TVM-land passes DLTensor as void)

Can you point to this code? At some point the producer must decide to allocate and fill in the C struct. That is the point where there needs to be a protocol for specifying a dlpack version. If the architecture of the producer disallows any such protocol, how can dlpack ever change?

Edit: I think the term used to describe such situations is "dependency hell".

tqchen commented 2 years ago

Some update after taking a closer look. Actually right now the proposed ABI did break S0 as well. The main reason was due to the fact that DLManagedTensor's deleter field seats right after a DLTensor. So any append of new fields would effectively change the location of deleter and break S0.

To further clarify the situation, let us think about the following possible changes to DLTensor fields:

C0: Append new fields to the DLTensor(e.g. readonly)
C1: Change some of the current field definition (that affects the layout of the current structure).

Due to our favoring of stability ideally C1 should be extremely rare. And C0 ideally should maintain backward compatibility(the new struct can be used safely in old settings -- ignoring the fields).

There are a few choices if we consider the need that "C0 should not break" ABI compatibility

A0: Place version in DLManagedTensor

struct DLManagedTensor {
   Version version;
   void * manager_ctx;
   void (*deleter)(struct DLManagedTensor * self);
   DLTensor dl_tensor;
};

Note that this is a ABI breaking change.

Importing and exporting won't be backward compatible
New importer needs to handle old exporting cases.
Note that deleter and manager context are placed before, so their location layout remains stable when field append to DLTensor happens

However, this allows future append to DLTensor fields without problem, so future C0 changes can be done in a backward compatible way.

A1: Place version and new flags in DLManagedTensor

struct DLManagedTensor {
   DLTensor dl_tensor;
   void * manager_ctx;
   void (*deleter)(struct DLManagedTensor * self);
   Version version;
   new_flags here
};

This is a ABI backward compatible change. It does bring restrictions

No changes to the current fields of DLTensor itself can be made
New flags are appended to DLManagedTensor, which also makes sense as properties such as read only can come with the managed exchange data structure, while the DLTensor core remains stable.
This shifts the burden of complexity to DLManagedTensor, with the assumption that core DLTensor should remain stable.

A2: A variant of A1 while allow upgrading DLTensor fields

struct DLManagedTensor {
   DLTensor dl_tensor;
   void * manager_ctx;
   void (*deleter)(struct DLManagedTensor * self);
   more flags
};

struct DLManagedTensorVersioned {
    Version version;
    DLManagedTensor managed_tensor;
}

In new version, allocate DLManagedTensorVersioned but still return the pointer to the managed_tensor.
During import, recover DLManagedTensorVersioned by subtracting the ptr offset to read the version information
Version info can be used to decide whether an ABI update is needed

Summary Discussions

Due to our favoring of stability. Ideally we would strongly favor S0 property (so frameworks do not need to bring extra support to keep up with the protocol), and the fact that DLPack rarely changes is an important property of the exchange protocol, of course, we do want to think about mechanisms to allow us to deal with breaking in case there is really need for it.

tirthasheshpatel commented 2 years ago

I am +1 for the A1 option. It is a minimalist option without severe backward-compatibility concerns. I have proposed this alternative in #105. @rgommers @mattip @seberg @leofang I'd appreciate your opinion on this!

seberg commented 2 years ago

I would tend towards creating the new name (i.e. A2 or variations of it). Otherwise, appending the fields requires out-of-bound exchange of the information whether the field exists. We can do that in Python but it seems stranger in C? EDIT: OK, I guess it is not strange in C. I think my main feeling is that once the old version is not used anymore A2 seems simpler.

If a new name is created, new C code can indicate that by using DLManagedTensorVersioned in headers to give some type/ABI safety. How exactly the DLManagedTensorVersioned that should look like, I am not sure. Just a stub with the version or any eternally stable ABI fields (which is than cast to the specific versioned one). In the above DLManagedTensor is included, which also works. The downside is that at some point the "included" version may change, so I am wondering if it isn't clearer to force the user to cast. (The layout is still the same in both cases of course, the question is how to expose it in the headers.)

tirthasheshpatel commented 2 years ago

If a new name is created, new C code can indicate that by using DLManagedTensorVersioned in headers to give some type/ABI safety.

How would one do that? I think the idea of A2 is to still export the managed_tensor field of DLManagedTensorVersioned struct and use offset of the version field to retrieve the version info.

seberg commented 2 years ago

I don't know how current API looks like. I imagine exposing something like from_dlpack(DLPackManagedTensor *tensor) and DLPackManagedTensor *managed_tensor = to_dlpack(object).

And the only way to transition that, is probably to create a new pair of functions that use DLManagedTensorVersioned *? But if that is how the transition happens anyway, then I don't see much point in placing the version last. So yes, a current from_dlpack could remain compatible with newer versions for now. But for how long? If there is any ABI change in the future it will remain being compatible. so I am not sure that would actually give a very clear transition for when an ABI change happens. (It would assume that eventually no old users exist and then ABI break is possible?)

So, I feel that advantage of "ABI compatibility" with current from_dlpack is unclear. So I would preference doing what seems cleanest in the long term. And for me, that is placing the version first and renaming the struct.

tirthasheshpatel commented 2 years ago

And the only way to transition that, is probably to create a new pair of functions that use DLManagedTensorVersioned *?

As far as C/C++ is concerned, I too think this is the only way to transition. Both A1 and A2 sound like good options then. I also see now how A2 can provide type safety. @tqchen The A2 option says "In new version, allocate DLManagedTensorVersioned but still return the pointer to the managed_tensor." I don't think we should do that since it takes the type safety advantage away (making A1 a simpler solution). Otherwise, I am happy to go with A2 too.

What do you think @mattip?

Let's wait for some others to chime in and then I can update the PR.

mattip commented 2 years ago

I like A2 as well.

wjakob commented 2 years ago

One of the mentioned disadvantages of A1 is "No changes to the current fields of DLTensor itself can be made". How would this look for A2? (In general, it seems to me that adding any fields to DLTensor will be an ABI break.)

tirthasheshpatel commented 2 years ago

How would this look for A2?

We can add new fields onto DLTensor since the version is the first field in DLManagedTensorVersioned (as opposed to A1 where the version field comes after DLTensor).

wjakob commented 2 years ago

This would work for software that knows how to interpret a DLManagedTensorVersioned with that specific version, but it would still break older software that depends on a compatible layout, no? How is that not an ABI incompatibility?

tirthasheshpatel commented 2 years ago

The idea is to introduce new functions (to_dlpack(DLManagedTensorVersioned *) and from_dlpack(DLManagedTensorVersioned *)) that convert and export the tensor to a DLManagedTensorVersioned *. It's good to have for type safety but otherwise, A1 is simpler.

wjakob commented 2 years ago

The killer feature of DLPack is its simplicity: it's just a header file with a few data structures that are trivially ported to various languages. Frameworks that can exchange DLPack tensors don't need to link to a shared or static library to do so. They don't need even deal with C or C++. Introducing required versioning-related functions as part of this repository sounds like it has the potential to significantly complicate this success story. My 2cts...

(Edit: I am against versioning per se. I am just wondering what this means for ABI interoperability between software that may and may not implement this versioned extension.)

tirthasheshpatel commented 2 years ago

Without versioning, DLPack can't evolve. There are already requests for read-only data support, non-native endianness, etc. Adding those without a version to separate the new structure will lead to spurious breaks everywhere (since not everyone will update at the same time and newer functions won't work with older ones). I am not sure if there is a simpler way; after a lot of discussions, these options seem to be the simplest of all. In long term, we can remove the old structs and also update the spec to not care about the old ABI.

Introducing required versioning-related functions as part of this repository sounds like it has the potential to significantly complicate this success story.

The proposal is to add version-info in the struct to avoid this exactly. All the complications happen in the spec and not the DLPack header file...

tirthasheshpatel commented 2 years ago

Hi everyone,

The issue has been up for almost a month. So far, I think @mattip, @seberg, and @leofang are in favor of versioning and @wjakob is against it. Let's wait until 16'th May for more feedback and, if there is none, it would be good to merge #105 and wait until 23'rd May before closing this to allow some time to revert the changes if needed. Please feel free to add your thoughts on how DLPack should proceed with versioning and ABI break. Thanks!

areusch commented 2 years ago

this is a tricky situation which I suspect comes down to how everyone actually uses DLPack in practice. it'd be helpful to know more about everyone's usage of DLPack so we could evaluate whether we're all debating the same use case or if there are others.

my initial impressions of the proposals:

A0.

good: version is the first field
bad: because there is no way to signal a version incompatibility between v1/v2. A1.
good: is an organic evolution into a versioned world, avoids immediate backwards-compat problems
bad: splits fields across DLTensor and DLManagedTensor, making DLTensor append-only and making it difficult to change the semantics of DLTensor fields. it's a half-solution in my book. A2.
good: avoids API break, Version is the first field in a wrapper class to DLTensor to make interpretation easier once a pointer to DLManagedTensorVersioned is found.
bad: finding the pointer to DLManagedTensorVersioned requires assuming that it exists; would the API tell you this? DLManagedTensor cannot be used when the version check fails (e.g. you cannot deallocate it).

might I suggest A3, a variant of A2:

static constexpr const uint32_t kVersionedDLTensorMagic = 0xD1D1D1D1;

struct DLManagedTensor {
   DLTensor dl_tensor;
   void * manager_ctx;
   void (*deleter)(struct DLManagedTensor * self);    
};

// note: supersedes DLManagedTensor
struct DLManagedTensorVersioned {
   Version version;
   void * manager_ctx;
   void (*deleter)(struct DLManagedTensorVersioned* self);    
   uint32_t magic; // set to kVersionedDLTensorMagic ^ (((uintptr_t) &self->dl_tensor) - ((uintptr_t) self))
   DLTensor dl_tensor;
}

here is what i think example usage might look like:

allocate DLManagedTensorVersioned (or invoke other library which may return either DLManagedTensor or (DLManagedTensorVersioned)->dl_tensor. Note that this amounts in both cases to returning a pointer to a DLTensor instance.
The client library's unknowns at this point are:
- The field layout of DLTensor
- The location of manager_ctx and deleter
- The version of the tensor
Perform the following algorithm to recover the version:
1. Suppose the returned pointer is void* caller_tensor.
2. Compute ptrdiff_t version_ptrdiff = *((int32_t*) caller_tensor )- 1) ^ kVersionedDLTensorMagic, which may be the difference between the address of DLManagedTensorVersioned.dl_tensor and DLManagedTensorVersioned. We could also enforce here that version_ptrdiff is e.g. < 0x100.
3. Compute DLManagedTensorVersioned* candidate_versioned = ((uintptr_t) caller_tensor) - version_ptrdiff;
4. Check if candidate_versioned->version is supported and validate version_ptrdiff based on knowledge of that version (e.g. v2 expects version_ptrdiff == 16).
5. If version does not match, compute DLManagedTensor dl_managed_tensor = caller_tensor and treat as v1.
Use per semantics defined in the version. To deallocate, invoke deleter(self), where those fields are looked-up as is apropos given version.

the advantage to this is that the managed fields are moved next to the version, so that they could be used to terminate the lifecycle of a tensor whose version is incompatible with the client library. this scheme is of course vulnerable to various memory tricks as well--you could consider patching Version to include some xor'd magic too. it is more complex, though. thoughts?

mattip commented 2 years ago

it'd be helpful to know more about everyone's usage of DLPack

That is the hard part here, and why @tirthasheshpatel submitted #106. Other users of DLPack can add additional use cases. Without concrete use cases, it is difficult to relate to "what if" scenarios.

To me the A3 proposal seems too "magical". The use of __dlpack_info__ in A2 allows the provider to explicitly state what version of the API they will support, answering "would the API tell you this?" with "yes".

@wjakob when you say

Introducing required versioning-related functions as part of this repository sounds like it has the potential to significantly complicate this success story

do you agree that the structure is incomplete? If not, what do you respond to NumPy who, in accordance with the Array API consortium recommendations, added support for DLPack but requested support for readonly and byteorder flags?

This need for versioning was pointed out in #34 which was opened in March 2019, and has been extensively discussed in other issues over the passing years.

wjakob commented 2 years ago

Just to make that clear (it seems there was some misunderstanding). I am definitely not against versioning. However, it would be useful if versioning-related information is appended to data structures so that they are still interpretable by older versions at an ABI level.

Second -- there is a lot of value to having DLPack data structures described as POD/Plain Old Data in a way that is accessible from many different languages without, e.g., having to include this repository or linking to a library. There was some of including C-level functions, for example by @tirthasheshpatel

The idea is to introduce new functions (to_dlpack(DLManagedTensorVersioned *) and from_dlpack(DLManagedTensorVersioned *)) that convert and export the tensor to a DLManagedTensorVersioned *. It's good to have for type safety but otherwise, A1 is simpler.

Maybe this is fine. However, this is a deviation from the current purely POD-style interface where the only kind of function is a function pointer.

Finally, one suggestion that I would also have is that the PR#101 adds some kinds of flags field.

Adding uint8_t readonly; is kind of wasteful (1/8 bits used, and another 3 unused bytes will be added for padding).

Why not add a uint32_t flags value? This could represent readonly status, endianness, and whatever other people will want to add in the future without having to discuss an ABI break every time.

areusch commented 2 years ago

@mattip agreed A3 is kind of magical and that's less desirable than an explicit indication. It seemed like there was an attempt being made to avoid breaking backwards-compatibility by considering the DLTensor fields as sacred and locating new fields elsewhere.

I think there are a few possibilities:

Two C++ libraries integrating without any Python support: If one library is to return a DLManagedTensor, we either have to signal a version by a) modifying memory outside the bounds of the v1 structure; b) modifying an unused field in the v1 structure; or c) introducing an explicit API which carries the drawbacks @wjakob mentioned (i.e. this is no longer just an in-memory format).
Consumer has Python support but is integrating with a C++-only producer library: I don't think __dlpack_info__ helps here because it would be the consumer's __dlpack_info__ wrapping the produced DLManagedTensor, right?
Both consumer and producer have Python support: then I think __dlpack_info__ is a reasonable out-of-band signaling mechanism.

I should also state that my interest here is purely from POV of having TVM interoperate with other libraries. These possibilities capture the range of situations I could imagine a standard such as this needing to confront.

tirthasheshpatel commented 2 years ago

@wjakob

it would be useful if versioning-related information is appended to data structures so that they are still interpretable by older versions at an ABI level.

Yes, this is exactly what #105 will do (I will update the PR to follow A2).

Second -- there is a lot of value to having DLPack data structures described as POD/Plain Old Data in a way that is accessible from many different languages without, e.g., having to include this repository or linking to a library. There was some of including C-level functions, for example by @tirthasheshpatel

I meant the C/C++ libraries that want to add support for the new ABI will have to create those new functions. DLPack itself won't have any functions and will still remain a POD. So, no linking will be necessary.

Why not add a uint32_t flags value? This could represent readonly status, endianness, and whatever other people will want to add in the future without having to discuss an ABI break every time.

The plan is to batch all the requests like readonly, endianness, alignment, etc in one single ABI break. Althouh, it would be useful to also add some padding for future use.

tirthasheshpatel commented 2 years ago

@areusch

Two C++ libraries integrating without any Python support: If one library is to return a DLManagedTensor, we either have to signal a version by a) modifying memory outside the bounds of the v1 structure; b) modifying an unused field in the v1 structure; or c) introducing an explicit API which carries the drawbacks @wjakob mentioned (i.e. this is no longer just an in-memory format).

For C++ libraries: new functions like DLManagedTensorVersioned *to_dlpack_versioned(...) and ... from_dlpack_versioned(DLManagedTensorVersioned *) can be added. Since the new struct is different, we get type safety and there would be no straightforward way for the users to mix DLManagedTensor and DLManagedTensorVersioned.

Consumer has Python support but is integrating with a C++-only producer library

I think this one's a tricky case. Yes, if a C++ tensor is imported in Python, there would be no __dlpack_info__ for the consumer to call. But I also don't know how would one import C++ tensors in Python. There has to be a C/C++ API using which the two communicate, in which case, it would be simply an exchange at C level. They can use the suggestion above i.e. introduce new to_dlpack_versioned and from_dlpack_versioned.

seberg commented 2 years ago

The problem I have is, that I still don't know how the C++ usage really looks like. In Python, I think this all works out great, because the usage is only an exchange Protocol. That means we have basically 1-2 functions to worry about and we can break API on those (even using a try/except).

For all I know, the C++ usage may be looking more like:

import some_library

tensor = DLPackManagedTensor()  # my data
result = some_library.do_calculation(tensor)
# or even, more complicated:
result = some_library.random_data()  # no input.

Now, if your API looks like that, you cannot replace every single function that takes a DLPackManagedTensor. And while it may be helpful, user of the library probably will not wan to deal with the logistics like:

result = some_old_library(to_old_dlpack(tensor))
result = to_new_dlpack(result)

(Those functions could be static inline functions included in the header, just for "emergency" use to support an outdated library though!).

Now, if we are dealing with C++ – as far as I understand – C++ should be doing name-mangling, so you can write some_library to support two ABIs by simply renaming the structs and exposing overloads. In that case, the user has to wait for adopting the new ABI until some_library supports it (or use conversion helpers in rare cases). But if you wait, things should work great and you get a linking error if some_library is outdated.

For a C only API, name mangling doesn't exist, so the "magic" may well be a useful way for a library to support both. OTOH, I am not sure if it wouldn't be just as well to add an #ifdef USE_NEW_DLPACK_ABI to the header and than mangel the symbol names manually (without the user really noticing).

From a Python API perspective: I don't think we as "Python exchange protocol" folks should care whether or not you insist on doing "magic". It simply doesn't affect us because we have better ways to deal with it. We can just trivially break the ABI by just adding a single try/except or similar.

But, at least me as a Python exchange protocol person do feel that trying to make dlpack a generic exchange protocol in Python requires an ABI break (or preparing for ABI break). There are already clear limitations, and a IMO a modern exchange protocol can be expected to have ABI versioning.

marktsuchida commented 2 years ago

Hope you don't mind me jumping in with some ideas even though I just today learned of this wonderful project.

I'll talk about the C ABI only. (I think the Python parts can work on top of what I propose below, though I could be missing something.)

It seems to me that readers (of the structs, not necessarily of the tensor data; same henceforth) usually need to evolve ahead of writers in order not to disrupt end users, given that the flags being considered for addition (read-only, byte-swapped) are things that cannot be safely ignored by a reader (it is better for a reader to give up rather than (try to) mutate a read-only buffer or use the wrong byte order). So I think what is important is for new readers to be able to safely read structs written by old writers, not the other way around. For this to be the case, there needs to be a way to check the version without danger of undefined behavior (segfault or worse).

I'm assuming here that binary compatibility (affects end users) is more important than source compatibility (affects mostly library devs).

If it is super-easy for the same reader to read either ABI v1 (current) or v2 (first versioned version), there could be a migration period during which libraries start supporting reading either v1 and v2 but continue to write v1 (except in testing, or in cases where the new features are essential). Once enough of the ecosystem knows how to read v2, libraries can start writing v2 with no disruption.

One way to ensure that there is no danger of undefined behavior with v1 and v2 structs coexisting is to use the high bit (bit 31) of device_type, which is currently always 0. We can set the bit to indicate a versioned (managed) tensor. Hopefully existing readers check this field, and would reject the new format safely (albeit with the wrong error message, although that's easy to fix).

Also, looking at the suggestions (as in #105) to add readonly (and future fields) to DLManagedTensor rather than DLTensor (for understandable reasons), it seems like there would no longer be much point in having separate structs for DLTensor and DLManagedTensor, because the former would no longer be usable by itself. DLTensor could just contain manager_ctx and deleter at the same offset, and zero them if not managed.

Taking these points into account, here is what I came up with:

typedef struct {
  int32_t device_type;   // High bit set if versioned tensor
  int32_t device_id;
} DLDevice;

typedef struct DLTensor {
  void *data;
  DLDevice device;       // See above
  int32_t ndim;
  DLDataType dtype;
  int64_t *shape;
  int64_t *strides;
  uint64_t byte_offset;  // Unchanged up to here
  void *manager_ctx;     // Write zero if not managed
  void (*deleter)(struct DLTensor *self);  // Ditto
  uint16_t version;      // Readers must give up if higher than known
  uint16_t flags;        // Bits TBD; unused bits must be written zero
  uint32_t reserved_in_v2;  // Must be written zero
} DLTensor;

typedef DLTensor DLManagedTensor;  // For compat

#define DLPACK_TENSOR_GET_VERSION(t) \
    ((t).device.device_type & 0x80000000 ? (t).version : 1)

#define DLPACK_TENSOR_GET_DEVICE_TYPE(t) \
    ((t).device.device_type & 0x7FFFFFFF)  // Or ... | 0x80000000 with new enum constants

enum { // These are TBD, I'm just showing as examples
  kDLFLAGS_READONLY_MASK = 1 << 0,
  kDLFLAGS_BYTESWAPPED_MASK = 1 << 1,
};

#define DLPACK_TENSOR_IS_READONLY(t) \
    (DLPACK_TENSOR_GET_VERSION(t) >= 2 && (t).flags & kDLFLAGS_READONLY_MASK)
// And so on

Note that manager_ctx and deleter are at the same offset as they always were, and that these new definitions can be used to write the v1 format as well, by simply clearing the high bit of device_type. So libraries can switch to the new header but keep writing v1 structs until others are ready to read v2.

The enum constants for device type could be updated to include the set high bit, making writing v2+ the default. Function macros could be made static inline functions instead. And there are probably other ways to improve this to decrease the chances of incorrect usage.

When future additions are made, the version number should be incremented if the added feature is something that readers cannot safely ignore, or if it requires enlarging the struct. Readers can indefinitely support structs of previous versions, as far as I can see, unless they require specific guarantees by the writer indicated by new features. Writers can choose to keep writing an older version (but maybe recommend >= 2) if they do not need any newer features and wish to support the broadest range of readers.

The reserved_in_v2 field may not be necessary (what are the chances of adding a feature that is not a boolean flag and does not require mandatory behavior of readers?), but I put it in to pad the struct to its natural alignment. If the field exists, writers must always zero it, or else future readers will misbehave. It is named with the version number so that future versions are less likely to cause bugs in existing source.

There could be exceptions to my view that readers generally need to support features before writers can adopt them. For example, a hypothetical "immutable" flag (not to be confused with read-only; immutable is an additional promise by the writer that data won't change) could be added without incrementing the version number, because readers that don't care can safely ignore it. (I'm not sure whether an "immutable" flag would be practically beneficial. Just an example to illustrate version evolution.)

seberg commented 2 years ago

Pinging here, since it has been idle for a long time and came up in NumPy yet again. To me (and others I have talked to), this is still a major blocker for any serious adoption of DLPack in Python. It would be nice to get some movement here. Otherwise we are sacrificing Python development to C++ compatibility which is a tradeoff that is not at all appealing to me from the Python side.

tqchen commented 2 years ago

Thanks @seberg for the ping and I agree with the point being said. Sorry for dropping the thread.

After reading all the suggestions in the post, here is one suggestions resolution(that mostly evolves from A2)

struct DLManagedTensor {
   DLTensor dl_tensor;
   void * manager_ctx;
   void (*deleter)(struct DLManagedTensor * self);
};

struct DLManagedTensorVersioned {
    Version version;
    DLTensor dl_tensor;
    void * manager_ctx;
    void (*deleter)(struct DLManagedTensor * self);
    // more flags
}

Keep DLTensor unchanged
Add additional complexity to DLManagedTensorVersioned
Enable one time ABI break, so more flags(read_only) can be added in the DLManagedTensorVersioned. DLManagedTensor remains the legacy version.

The immediate need for more flags is the read_only one. Which I agree is useful. We can discuss whether make it a bitmask or a uint_8 flag. Likely a uint32 boolean mask might be helpful to allow future flags being added.

In order to manage the breaking change, we would need to update the python APIs in two steps:

S0: Having existing from_dlpack function check the ABI version in __dlpack_info__, and raises exception when the version mismatches. Future version of ABI updates expects the capsule to be DLManagedTensorVersioned.
S1: Update the code to allow consuming of ABI versions that are pre DLManagedTensorVersioned.

The C++ side's code can automatically detects type errors as the type of DLManagedTensorVersioned and DLManagedTensor differs from each other.

Would love to see how folks think. @tirthasheshpatel @seberg @wjakob @mattip @areusch @rgommers

mattip commented 2 years ago

Maybe we could try to set up a half-hour zoom call around this since it seems we are not communicating the change properly. As I understand things, it is incorrect to check ABI version in from_dlpack(obj: PyCapsule) -> Array, the capsule has already been created using either a DLManagedTensor or a DLManagedTensorVersioned. If the latter, and the consumer has not been updated to handled DLManagedTensorVersioned, the consumer will raise an error since it will not understand the PyCapsule contents. Once we have transitioned to most producers using DLManagedTensorVersioned in Array.__dlpack__(), then indeed the consumer can check the DLPack version and error if the version number is not understood.

As proposed in #101, the place to call __dlpack_info__() -> Version is before calling the producer's Array.__dlpack__(self, *, stream: int = None, version: int = 70) -> PyCapsule. The call would either return a DLManagedTensor in the PyCapsule (if version <=70) or a DLManagedTensorVersioned (if version > 70).

tqchen commented 2 years ago

Sorry I was not getting it clearly,

As per the new data-api, from_dlpack takes in a class (nnot the capsule) that contains obj.__dlpack__ method, so the python implementation can call obj.


def from_dlpack(other: Object):
     try:
        ver = other.__dlpack__info()
     except:
         # older version that do not have info.
     # checks here
     return internal(other.__dlpack__())

areusch commented 2 years ago

so i guess are people passing around non-wrapped void* that refer to either DLTensor or DLManagedTensor? I think this would be a way that the C++ side would fail to detect the error at compile time, since casts may be present.

@tqchen curious if you're taking a position on the exact ABI break to make here (since @marktsuchida suggested one above), or whether you're first wanting to tackle the semantics of how the program behaves in the presence of a version mismatch?

tqchen commented 2 years ago

In the case of C++:

Normally users are passing around typed pointer DLTensor DLManagedTensor and DLManagedTensorVersioned. There is no break of ABI since type clearly indicates the values
When user decodes to take void*, it is usually a result of FFI, given the primary use case is python, S0 step is designed so that our most common use-case, python array exchange have a clear path for updating without breaking things.

Indeed the proposal won't handle the breaks where a C++ api takes a void* and do cast. This is a very rare case, and the benefit of having a stable clean ABI for future versions outweights the one-step transition patches to support such cases, which I am not aware of any in production.

seberg commented 2 years ago

Cool, I guess in C++ there is no serious ABI issue (unless dirty casts things are done). In C, there is in principle and libraries may need to introduce a new function name when adopting (if they wish to be careful and distinguish), but it seems not a major concern?

If we can agree on such an ABI break by introducing a new name, then I think we can proceed with making a concrete proposal that is Python friendly (or a concrete proposal with the Python API in mind). I am unsure whether we can get away with not modifying DLTensor (creating a new name), but I suspect we can at least for the time being.

If that is the step where we are, I would suggest a meeting with those interested to discuss how a very concrete Python API/friendly proposal will look like. That might just be Matti, me, and others from the Python side in a first instance but it would also be great if more will join.

leofang commented 2 years ago

On the C++ side, as long as we make it clear we discourage C-style cast and reinterpret_cast it should be safe I think.

+1 for a meeting on this.

tqchen commented 2 years ago

Happy to host a half-hour zoom call on this. Please fill out your availability here https://www.when2meet.com/?16752946-wS8RV for the week of Sep 19.

Note that the time is in EDT, so if you are in different time-zone, maybe some converstion is needed.

leofang commented 2 years ago

Sorry, @tqchen, is it possible to open up time slots in the week of Sep 26? NVIDIA folks (=me 😅) would be busy in the Sep 19 week due to GTC.

tqchen commented 2 years ago

OK, updated the link to change ro week of Sep 26

marktsuchida commented 2 years ago

I don't have enough of a stake in this to warrant joining the call, but just a quick note that I wrote my above comment under the assumption that (1) existing consumers blindly assume that received capsules contain (as type-erased void *) the existing DLManagedTensor layout, and that (2) it is desirable to prevent such consumers from crashing or corrupting memory when they receive the new (versioned) data layout. (As far as I can tell, a magic checksum does not help with (2), because existing consumers don't know about the magic. Hence the use of the high bit of device_type.)

I think I may have been incorrect about (1) -- if the capsule names (dltensor and used_dltensor) are changed (once and for all, say, to dlvtensor and used_dlvtensor) when the new, versioned data layout is introduced, there will be no chance of (correct) old consumers trying to read new data incorrectly. Updated consumers can support the old and new names for a period, if desired. (Changing the capsule names was mentioned by @tirthasheshpatel in #34.)

The scheme I proposed is complicated and ugly, and would remain so semi-permanently. Its only advantage is that it can offer memory safety (in the (2) sense) without the type information having to travel separately. I wouldn't recommended it if there are other solutions like changing the capsule names.

tqchen commented 2 years ago

thanks @marktsuchida for careful thoughts, you are more than welcome to join the call btw

tqchen commented 2 years ago

Let us lock down to Friday, 230 EDT. Sep 30 given the responses. Please mark on your calendar and I will send a zoom link here

tqchen commented 2 years ago

The open discussion meeting of DLPack ABI update is scheduled to be on Friday 230EDT Sep 30. Please use the zoom link to join the meeting

tqchen commented 2 years ago

Summary Strawman from the meeting

Introduce DLManagedTensorVersioned
Adopt a name change in pyscapsule as @marktsuchida suggested to ensure proper error message when old dlpack trying to import from new one.

Thanks @leofang @seberg @kkraus14 for joining the meeting. We will keep this thread open for one week feedback then post a notice for one month before we do the update

struct DLManagedTensor {
   DLTensor dl_tensor;
   void * manager_ctx;
   void (*deleter)(struct DLManagedTensor * self);
};

/*!
 * \brief The DLPack and DLPack ABI versions of the tensor.
 */
typedef struct {
  /*! \brief DLPack version. */
  uint32_t dlpack;
  /*! \brief DLPack ABI version. */
  uint32_t abi;
} DLPackVersion;

struct DLManagedTensorVersioned {
    DLPackVersion version;
    void * manager_ctx;
    void (*deleter)(struct DLManagedTensorVersioned * self);
    uint64_t flags;
    DLTensor dl_tensor;
}

typedef DLPACK_BIT_MASK_READ_ONLY  1

DataAPI change: PyCapsule - rename to vdltensor, so there is proper error messages because of name mismatch

edgarriba commented 1 year ago

@tqchen do you have any timelime to release the managed version ? we are maintaining rust bindings here https://github.com/kornia/dlpack-rs that we use for image/video loading in kornia for which we want to release soon new versions.

dmlc / dlpack