Discussion on mapping between amrex, numpy.ndarray, and torch.tensor data types

JBlaschke commented 3 years ago

Hey, this is not so much an issue as a place to solicit public feedback.

I think we should implement type conversion from the amrex FArrayBox (or more precisely the Array4) data type to numpy.ndarray and torch.tensor. As well a suitable python CUDA variants.

I also think that this type conversion should have s copying and a referencing variant.

This shouldn't be hard to implement (NO! This won't support python 2... I have a life you know), and I volunteer my time. But first I want to run this past all y'all to see if anyone is already working on it and what you think.

Tagging @ax3l @maxpkatz @drummerdoc

JBlaschke commented 3 years ago

I think this would be a good basis for more complex amrex types. Since torch and python don't have a standardize framework for expressing amr, this is (in my opinion) the lowers common denominator.

We should also keep in mind how we deal with boxes whose indices don't start at 0. @ax3l's box type already has what we need I think. So we might need to implement a thin wrapper around numpy and torch that map amrex-style indexing to python indices.

Also tagging @sayerhs

ax3l commented 3 years ago

Thanks for starting a sticky thread so we can collect the approaches. Let me start with what I am using so far:

General arrays (incl. numpy):

either we code against the Python buffer protocol (scipy/PEP3118, Python.org)
- see docs: numpy (scipy)
- see openPMD-api for a very general implementation (we support less types in AMReX)
or we understand the new __array_ufunc__ protocol and implement that NEP-13 - maybe that is unrelated
a large stack of software (cupy, numba, PyTorch, ... etc., see below) standarizes on the __cuda_array_interface__ convention (CUDA Array Interface v3)
or we code against xtensor[-python] for extra C++ niceness (cost: extra C++ dependency that we don't directly use here)

Device memory:

with @namehta4 we passed around device memory recently by using the buffer protocol and passing a non-owned device pointer with meta-data
I recently opened this cupy issue to ask how to do it right: https://github.com/cupy/cupy/issues/4644 - they also recommend to standardize on __cuda_array_interface__ - going directly to the emerging DLPack APIs

Compatibility:

cupy/numba: https://docs.cupy.dev/en/stable/reference/interoperability.html
numba compatibility details for __cuda_array_interface__ v3

Screenshot from 2021-02-13 13-36-11

JBlaschke commented 3 years ago

Thanks @ax3l that list is a good starting point. I would vote for the python buffer protocol strategy as a starting point. This seems to work well PyCUDA also. We could then also implement some of the alternatives, depending on how much demand from applications there is, what benefits there are in each, and how much bandwidth we all have.

I'll do some reading to see if there is a benefit that would entice me to change my vote. (thanks for the references)

ax3l commented 3 years ago

Agreed, I think after going through all the material again:

buffer (array) protocol for CPU memory (ND, strides)
__cuda_array_interface__ v3 (C-example) for transporting device-side memory w/o host-device copies

to start with. This will give us exposure to exactly the libraries and communities we want to interface with.

ax3l commented 3 years ago

Starting support for AMD GPUs (and Intel) in DLPack (__dlpack__):

ax3l commented 2 years ago

FArrayBox for CPU via the array interface is now implemented via #19.

Next is either the __cuda_array_interface__ or DLPack. Should not be too hard to add both.

ax3l commented 2 years ago

CUDA bindings for multifabs including cupy, numba and pytorch coming in via #30

ax3l commented 1 year ago

Did some more DLPack deep diving with @scothalverson.

What we want to implement here is primarily the producer, __dlpack__. This one creates a PyCapsule, essentially a transport of a void*. The data behind this pointer is laid out in the spec of DLPack (C/Python).

Relatively easy to read implementations are:

AMReX-Codes / pyamrex

Discussion on mapping between amrex, numpy.ndarray, and torch.tensor data types #9