[ABI] ABI Update For adding Version to DLPack

tqchen commented 1 year ago

This PR adds DLPackManagedTensorVersioned to DLPack

Add a new data structure DLManagedTensorVersioned, which contains a new version field.
Add DLPackVersion struct to indicate both api and abi versions.
Add flags field for boolean options, with support for read only bit mask.

The change is still ABI compatible, as the new data structure ABI comes with the DLManagedTensorVersioned. The data-api however would involve an ABI update and update of all DLPack importer/exporters to make use of DLManagedTensorVersioned.

tqchen commented 1 year ago

notice thread https://github.com/dmlc/dlpack/issues/110

tqchen commented 1 year ago

cc @leofang @rgommers @oleksandr-pavlyk @tirthasheshpatel @seberg @kkraus14 @wjakob

tqchen commented 1 year ago

ready for review, this PR will remain open for at least one week

wjakob commented 1 year ago

I feel like the interface could still be more explicit about what I would argue is the most important point about a versioned ABI: what happens when a program is given a DLPack tensor by some other tool, and the DLPackVersion::abi value of this tensor is higher than what the current program was complied with?

With the breakneck speed of advances happening with ML and array programming, it is not inconceivable that the DLManagedTensorVersioned data structure will need to be extended in the future, requiring another bump to the DLPACK_ABI_VERSION value. Here are just a few examples from the top of my head, I am sure that we could brainstorm to come up with further and more convincing use cases if needed:

In future complex application involving accelerators, the device ID may not be descriptive enough. For example, a single device might be associated with multiple execution contexts, which would then need to be specified. Actually, this is already possible with CUDA today and such tensors cannot be safely exchanged because they live in separate address spaces.
As ML models become larger and more distributed, DLPack might need to be extended to refer to tensors on network-attached accelerators.
Experimentation with tensor data types (e.g. beyond BFloat16) might lead to a situation where more context is needed to explain the actual data type of a tensor (i.e., in a way that doesn't neatly fit into the existing enumeration).

If such extensions are added, the would likely be irrelevant for most users except for a small sub-community that requires them. And that's basically the point I want to raise: It would be a shame if simply adding an extra field will require another ABI break.

The current PR contains the message

\note Future ABI changes should keep everything until this field stable, to ensure that deleter can be correctly called.

(and then the actual things of interest follow, specifically flags and dl_tensor)

I find this quite conservative. Reading between the lines, to me this statement suggests that any disagreement in the DLPackVersion::abi field will need to break interoperability between frameworks exchanging DLPack tensors. The only thing one is allowed at that point is to give up and call the deleter.

In practice, it would be perfectly fine to add fields to the data structure without breaking interoperability with a lower-versioned consumer, especially when those extra fields encode nuanced attributes that are perhaps only relevant to very specific use cases.

ABI breaks are extremely disruptive to the community, especially in the Python ecosystem where everyone ships separately pre-compiled packages via PyPI. Even for this proposal, I suspect that will take a long time for the community to adopt versioned DLTensors. Therefore, this is also a unique chance to lower the cost of future changes, and I feel like the current PR misses that opportunity.

wjakob commented 1 year ago

To turn the above complaint into something more constructive, let me propose a three-part (MAJOR.MINOR.PATCH) DLPackVersion versioning scheme with the following interpretation:

A changes in a DLPack tensor's PATCH level indicates minor changes to this header file on the producer's side, such as the addition of a new enumeration entry (a new device, a new tensor data type, etc.).

Example: Suppose a new BFLOAT24 tensor type is added. It is legal for a DLPack consumer to read fields from a future PATCH level subject to the constraint that it can interpret the enumeration values it sees. If the consumer does not know how to interpret the BFLOAT24 type, it should give up and call the deleter.
A bump in the MINOR field indicates that a field was added at the end of the DLPackVersioned data structure (in addition to potential enumeration additions).

Example: Suppose that a new device ID is added (kDLNetworkedGPU) which refers to a network-capable GPU. Accessing data there requires an IP address and port field (lots of hand waving here, it's just an example.. :-)). As before, it is perfectly fine for a DLPack consumer to read fields from a tensor with a MINOR version from its relative future subject to the constraint that it can interpret the enumeration values it sees. In this example, it would be safe for the consumer to access the tensor as long as it doesn't have the the newly added kDLNetworkedGPU type. It would not access the extra field(s) that it is in any case not aware of.
A bump in the MAJOR field indicates some major restructuring. It is not safe to use a tensor if a major version disagreement is detected.

Example: the latest transformer model GPT-10 has now become so big that the DLTensor must be upgraded from int64_t *strides to int128_t *strides. That messes up the whole data layout, and a full ABI break is necessary. Even in this case, there is a guarantee, which is that the tensor data structure starts with the fields
```
DLPackVersion version;
void *manager_ctx;
void (*deleter)(struct DLManagedTensorVersioned *self);
```
This means that a consumer can at least call the deleter, but that's it.

tqchen commented 1 year ago

Thanks @wjakob for the proposal. I agree that we want to be careful with the changes and it is great to have those ideas discussed before we commit to them.

Would love to see what others think as well

leofang commented 1 year ago

This is an interesting discussion, but I would argue it is not of concern to the Python Data Consortium. Basically, all we need is a safe way to continue evolving the existing DLPack protocol for describing a span of tensor data residing on a single device (important to your second point below). IIRC there's no discussion on evolving the protocol itself, as so far everyone is mostly happy with the convenience brought by DLPack. There are plenty of prior discussion scattered in the open issues of this repo that are finally converging to this PR.

In future complex application involving accelerators, the device ID may not be descriptive enough. For example, a single device might be associated with multiple execution contexts, which would then need to be specified. Actually, this is already possible with CUDA today and such tensors cannot be safely exchanged because they live in separate address spaces.

The fact that such a possibility already exists but has never generated any change request to DLPack speaks for itself. In the case of multiple CUDA contexts co-residing on the same device, generally it is hard to exchange properly without some common understanding (ex: both sides must use the same CUDA context, which in practice almost always refers to the CUDA runtime context). It is even harder to take into account the needed API/ABI change to accommodate for similar situations in other vendors' programming models. If NV and Intel both consider this is appropriate (I do not speak for either on this), I'd argue let's not overcomplicate the story 🙂 Flexibility is good, but less so if we cannot make any progress.

As ML models become larger and more distributed, DLPack might need to be extended to refer to tensors on network-attached accelerators.

As mentioned above, DLPack has been designed as a single-device exchange protocol. This use case is exactly why array libraries like Dask and cuNumeric cannot support DLPack, CUDA Array Interface, or any similar protocol if they ever exist. Distributed protocols have been discussed extensively (ex: here), but the fact that it went nowhere is a big issue (no distributed library provider can find common ground) that should not block this PR.

Experimentation with tensor data types (e.g. beyond BFloat16) might lead to a situation where more context is needed to explain the actual data type of a tensor (i.e., in a way that doesn't neatly fit into the existing enumeration).

An actual data type here is represented by DLDataType, which contains enum, bits, and, lanes. If a tensor data type cannot be represented by these fields, it is arguable that they are too complicated to exchange through DLPack. But maybe I am biased on this.

seberg commented 1 year ago

My gut feeling is squashing "minor" and "patch" if fine (but I don't care either way).

There are two main groups of changes:

Breaking: Don't even try reading the data in this struct! (Beyond version and cleanup)
Other: May be incompatible, but you will notice as you go along (new device can break you if it is unified memory and you are a CPU library). But it is clearly an "unsupported device". The version might be useful, but just to include it in an error to the user: "maybe a version problem?"

From the Python side I am not scared about either kind of changes at all. But I ask for a version request (by the consumer). If they say version="1.3" and the exporter supports version="1.5" which added new devices (i.e. unified memory rather than just CPU), they can decide to not use those new devices. Note that it would probably be OK to export with "1.5" as version for simplicity.

We don't have to discuss the details about what might happen, a true ABI break should be assumed to happen eventually. It doesn't matter much which scenario causes it :).

seberg commented 1 year ago

Should this be pushed forward soon? It seems to me that the versioning may be the only small discussion here. Since further changes to the format should only affect the included tensor there seems not much else?

I am just hoping it might unblock further steps. I am trying to think about the exchange protocol more, but unless we change API more there is probably not much to do. For Python, I would update the capsule name and pass a maximum supported version (so that the exporter can produce older versions if they wish to).

It isn't that I can't think of API breaks to consider, but it seems OK to push this in any case.

tqchen commented 1 year ago

Yes, sorry was a bit slow near the end of the year .

Summarizing suggestion on versioning here. How about we do the following

struct DLPackVersion {
    // A number that indicates an ABI break  
    int32_t major;
    // A number that indicates a API change with backward compatible API.
    int32_t minor;
};

This would mark a change in versionin scheme, which is probably fine. Given DLPack is pre 1.0. We can start with 1.0 release with this new ABI change.

leofang commented 1 year ago

So with the proposed DLPackVersion it'd make DLPack semantically versioned. I like it too.

tqchen commented 1 year ago

Happy new year, seems we are all good with major, minor version change. @wjakob @leofang @seberg please take another look and approve explicitly if you can

tqchen commented 1 year ago

Thanks @wjakob for suggestions, I have updated the comment to include your suggested clarifications. Let us move discussion about python api to https://github.com/dmlc/dlpack/issues/116

tqchen commented 1 year ago

@wjakob please let me know if you have more comments :)

tqchen commented 1 year ago

Going merge if there is no more input in the next three days

tqchen commented 1 year ago

Thanks everyone for participating in discussion and reiews, this change is now merged.

dmlc / dlpack

[ABI] ABI Update For adding Version to DLPack #113