AMReX-Codes / pyamrex

GPU-Enabled, Zero-Copy AMReX Python Bindings including AI/ML
http://pyamrex.readthedocs.io
Other
37 stars 19 forks source link

Discussion on mapping between amrex, numpy.ndarray, and torch.tensor data types #9

Open JBlaschke opened 3 years ago

JBlaschke commented 3 years ago

Hey, this is not so much an issue as a place to solicit public feedback.

I think we should implement type conversion from the amrex FArrayBox (or more precisely the Array4) data type to numpy.ndarray and torch.tensor. As well a suitable python CUDA variants.

I also think that this type conversion should have s copying and a referencing variant.

This shouldn't be hard to implement (NO! This won't support python 2... I have a life you know), and I volunteer my time. But first I want to run this past all y'all to see if anyone is already working on it and what you think.

Tagging @ax3l @maxpkatz @drummerdoc

JBlaschke commented 3 years ago

I think this would be a good basis for more complex amrex types. Since torch and python don't have a standardize framework for expressing amr, this is (in my opinion) the lowers common denominator.

We should also keep in mind how we deal with boxes whose indices don't start at 0. @ax3l's box type already has what we need I think. So we might need to implement a thin wrapper around numpy and torch that map amrex-style indexing to python indices.

Also tagging @sayerhs

ax3l commented 3 years ago

Thanks for starting a sticky thread so we can collect the approaches. Let me start with what I am using so far:

General arrays (incl. numpy):

Device memory:

Compatibility:

Screenshot from 2021-02-13 13-36-11

JBlaschke commented 3 years ago

Thanks @ax3l that list is a good starting point. I would vote for the python buffer protocol strategy as a starting point. This seems to work well PyCUDA also. We could then also implement some of the alternatives, depending on how much demand from applications there is, what benefits there are in each, and how much bandwidth we all have.

I'll do some reading to see if there is a benefit that would entice me to change my vote. (thanks for the references)

ax3l commented 3 years ago

Agreed, I think after going through all the material again:

to start with. This will give us exposure to exactly the libraries and communities we want to interface with.

ax3l commented 3 years ago

Starting support for AMD GPUs (and Intel) in DLPack (__dlpack__):

ax3l commented 2 years ago

FArrayBox for CPU via the array interface is now implemented via #19.

Next is either the __cuda_array_interface__ or DLPack. Should not be too hard to add both.

ax3l commented 2 years ago

CUDA bindings for multifabs including cupy, numba and pytorch coming in via #30

ax3l commented 1 year ago

Did some more DLPack deep diving with @scothalverson.

What we want to implement here is primarily the producer, __dlpack__. This one creates a PyCapsule, essentially a transport of a void*. The data behind this pointer is laid out in the spec of DLPack (C/Python).

Relatively easy to read implementations are:

More involved or less documented are:

The DLManagedTensor is essentially:

This object is referred to in the capsule we produce.