data-apis / array-api

RFC document, tooling and other content related to the array API standard
https://data-apis.github.io/array-api/latest/
MIT License
205 stars 42 forks source link

xp.from_dlpack to move data to a preallocated array on the target device #750

Closed ogrisel closed 4 months ago

ogrisel commented 4 months ago

Since #741 was merged, it should now be possible to perform data transfers across devices and namespaces in a standard, namespace agnostic way (as long as both namespace have a shared understanding of a common device type, such as the CPU/host).

However, such transfers cannot reuse pre-allocated buffers in the target namespace/device.

Would it be possible to add an out argument to from_dlpack?

def move(x, y):
    xp_x = x.__array_namespace__()
    xp_y = y.__array_namespace__()
    if not xp_x == xp_y:
        xp_x.from_dlpack(y, copy=True, out=x)  # device=x.device is implicit in this case.
    else:
        x[:] = y

This API might be a bit too weird. I am open a to better suggestions.

rgommers commented 4 months ago

This question isn't specific to DLPack, is it? It seems to me that this is the same as a generic out= keyword or a "reuse this memory" API (or at least the other copy= keywords in functions like asarray). The main discussion on that happened in gh-24.

ogrisel commented 4 months ago

Ah yes sorry, I forgot about that. Let's close.

leofang commented 4 months ago

@ogrisel this is a very good exercise to consider the design challenges for what you asked for, even if we were to allow out= in the standard.

In your snippet, xp_x (xp_y) is the consumer (producer), and we're passing a consumer array x to from_dlpack. As noted elsewhere, DLPack is a one-way protocol, not two-way, but your requirement means a capsule (for x) needs to be passed to the producer from a consumer too (but so far we only support the other way around), which requires a two-way protocol.

Furthermore, the latest copy feature is possible because the burden is still on the producer to provide a valid capsule, whose deleter would be responsible for releasing the tensor back to the producer when it's done. With this two-way exchange, the deleter is no longer specific to the producer anymore. We would at least need to set the deleter as no-op when out= is set (and I might have missed other subtleties here).

In any case, it's technically not feasible without redesigning the exchange protocol. Not saying it can't be done, but it'd take quite some effort.