dmlc / dlpack

common in-memory tensor structure
https://dmlc.github.io/dlpack/latest
Apache License 2.0
905 stars 133 forks source link

Next steps for the Python API? #116

Closed seberg closed 8 months ago

seberg commented 1 year ago

Once gh-113 is finalized, we can extend the Python exchange API to use the new API.

Let me suggest the following API changes (which have been brought up before):

All Python libraries should update to the new version as soon as possible, it is encouraged to deprecate support for the old API version at the same time or soon after.

Alternative: We had discussed adding __dlpack_info__ and then passing in version= rather than a max_version=. This scheme is described here: https://github.com/dmlc/dlpack/pull/101/files#diff-851e948c4827ccec1de9b174d011fc92676f2e195ac863c4a055d9cc508575f4R156 I don't mind this scheme, although I feel that renaming the capsule wouldn't hurt for safety even there.

I am sure there are more alternatives, and I wouldn't shy away from larger changes, but I have no clear plan for that.

tqchen commented 1 year ago

Agree with dltensor_versioned renaming change.

+1 to deprecation of the old API.

I also like the max_version argument here.

In terms of the versioning requirement, it would be useful to think about possible deprecation scheme and versions supported by the producer -- since producer ideally should not stay with too old files. Personally I think staying K release cycle(say K=2) makes sense given we are moving slowly, with a preference to deprecate the old DLManagedTensor ASAP.

Because DLPackVersion is always in the header, the producer can safely ignore max_version is necessary, and the consumer can still check the version.

tqchen commented 1 year ago

cc @wjakob @rgommers @leofang

seberg commented 1 year ago

To take that over from the other issue: I don't care about introducing a new dunder. I suggested doing this, because I had a feeling there was some want to stick with the __dlpack__ "brand" rather than switch to __vdlpack__ long-term.

Yes, in the transition period where some consumers try to fetch the new version but fall back, it would add 200-300ns overhead (on my machine).

tqchen commented 1 year ago

I like the idea of switch with __dlpack__ brand. now that #113 is merged, would be great if we can take actions to update the data-api part of the spec

leofang commented 1 year ago

We'll see if we can get a short window to discuss about this in the meeting tomorrow later this week.

rgommers commented 1 year ago

Great to see this moving forward! What is proposed here seems good to me; I also like keeping __dlpack__ with max_version over adding a new dunder method.

So to spell it out, this would be the new signature:

__dlpack__(*,
    max_version: Optional[tuple[int, int]]=None,
    stream: Optional[Union[int, Any]] = None
) -> PyCapsule

And we should update the consumer side of all libraries first with:

try:
    x.__dlpack__(max_version=(1, 0))
    # if it succeeds, store info about capsule name being "dltensor_versioned",
    # and needing to set the capsule name to "used_dltensor_versioned"
    # when we're done
except TypeError:
    x.__dlpack__()

And the producer side at the same time by updating to the new signature, and

def __dlpack__(...):
    if max_version is None:
        # Keep and use the DLPack 0.X implementation
        # Note: in >= 2 years from now (but ideally as late as possible), it's okay
        # to raise BufferError here
    else:
        # We get to produce `DLManagedTensorVersioned` now
        if max_version >= our_own_dlpack_version:
            # Consumer understands us, just return a Capsule with our max version
        elif max_version[0] == our_own_dlpack_version[0]:
            # major versions match, we should still be fine here - return our own max version
        else:
            # if we're at a higher major version internally, did we keep an implementation
            # of the older major version around? If so, use that. Else, just return our max
            # version and let the consumer deal with it.
rgommers commented 1 year ago

I opened https://github.com/data-apis/array-api/pull/602 for this.