ComputationalRadiationPhysics / student_project_python_bindings

The student project investigates the performance and memory handling of Python bindings for CUDA C++ code created with pybind11.
GNU General Public License v3.0
1 stars 0 forks source link

Make Cupy_Ref compatible to cupy (and numpy) #23

Closed SimeonEhrig closed 2 years ago

SimeonEhrig commented 3 years ago

Cupy allows easy mathematics operations with arrays. For example:

import cupy as cp
a = cp.array([1,6,9])
b = cp.array([4,12,16])
print(a * b)
# [  4  72 144]

This should be also possible with Cupy_Ref.

import cupy as cp
c = get_cupy_ref([1,2,3]) # type Cupy_Ref
d = cp.array([4,12,16]) # type cupy.array
print(c + d)
# [  5  14 19]

Maybe it is also possible provide interoperability with numpy, like the guide show it: https://docs.cupy.dev/en/stable/user_guide/interoperability.html

import cupy as cp
e = get_cupy_ref([4,12,16]) # type Cupy_Ref
print(np.sum(e))
# [32]
SimeonEhrig commented 3 years ago

@uellue Can you give a short note, what you expected from the Cupy_Ref object, if you use it together with cupy and numpy arrays/functions, please.

SimeonEhrig commented 3 years ago

I think, I found a solution for the issue. The class cupy.cuda.UnownedMemory allows to wrap an cuda memory to an cupy array. I think, we should us this as implementation for Cupy_Ref.as_cupy_array(). This also means, we need now extra C++ code or bindings.

uellue commented 3 years ago

There were two points we discussed:

  1. Using the read() and write() calls for the object and illumination https://gitlab.hzdr.de/crp/ptychography/-/blob/master/src/binding/binding.py#L41 was a bit clumsy and it would be nice to access these more directly as NumPy and/or CuPy array, for example.
  2. It would be nice if the detector data could be supplied as an array, instead of having it read from files in epie.init(...). That way we can use alternative ways to supply the data, for example alternative readers or file formats. It also helps with feeding subsets of a dataset to the algorithm for distributed operation.
SimeonEhrig commented 3 years ago

Like we discussed in the VC, try to implement the __cuda_arrary_interface__ interface and use cupy.asarray

https://docs.cupy.dev/en/stable/reference/generated/cupy.asarray.html https://github.com/cupy/cupy/issues/3202#issuecomment-601934020

SimeonEhrig commented 3 years ago

Maybe this code snippet can help you, to detect some memory allocations, which you don't want

import io, json
import cupy
from cupy.cuda import memory_hooks

def main():
    dump = io.StringIO()

    with memory_hooks.DebugPrintHook(file=dump):
        # memory to trace
        x = cupy.array([1,2,3])
        del x
        y = cupy.array([1,2,3,4,5])

    for d in dump.getvalue().split("\n"):
        print(d)

    dump.close()

if __name__ == "__main__":
    main()

https://docs.cupy.dev/en/stable/reference/generated/cupy.cuda.memory_hooks.DebugPrintHook.html

SimeonEhrig commented 3 years ago

@uellue I'm pretty surprised. My student implemented the cupy array interface for a cupy_ref: https://github.com/ComputationalRadiationPhysics/student_project_python_bindings/blob/main/cuda_phase-retrieval_class/include/cupy_ref.py It is not to complicated.

The effectiv memory copy is also working. This example does only copies the data one time to the device and back. GPU_memory_holder is a c++ class, which holds the GPU data and incOne is also a python binding. There is no explicit copy necessary. https://github.com/ComputationalRadiationPhysics/student_project_python_bindings/blob/47d36b6469ffcfd7b28c10bc6adf26d953d3b7b6/cuda_phase-retrieval_class/test/test_8_cupy_reference_cuda_interface/test_interface.py#L50-L69