parse gpu array from python to Julia

jakubMitura14 commented 2 years ago

hello I have cupy cuda array and I want to pass it into julia as is. CUDA arrays are just list of pointers so it should be possible from CUDA.jl side I know is possible as I have a comment """ for passing data the other way around you can use unsafe_wrap(CuArray, ...) to create a CUDA.jl array from a device pointer you get from Python """ still I can not make it work - anybody have some working example?

What I was trying

import cupy
import numba
import numpy as np
import torch
import torch.utils.dlpack
from statistics import median
import timeit
from juliacall import Main as jl
# julia.install()
from numba import cuda
jl.seval("using Pkg")
jl.seval("""Pkg.add("CUDA")""")
jl.seval("""Pkg.add("PythonCall")""")
jl.seval("""using CUDA""")
jl.seval("""using PythonCall""")
jl.seval("""CUDA.allowscalar(true)""")
jl.seval("""print(sum(CUDA.ones(3,3,3)))""")# working good
jl.seval("""function bb(arrGold)
    # print(CUDA.unsafe_wrap(CuArray{UInt8,3},arrGold, (2,2,2)))
    print( pyconvert(CuArray{UInt8} ,arrGold ))
end""")

def print_hi(name):
    t1 = torch.cuda.ByteTensor(np.ones((2,2,2)))
    c1 = cupy.asarray(t1)

    Main.bb(c1)

    def forBenchPymia():
        numba.cuda.synchronize()
        jl.bb(c1)
        numba.cuda.synchronize()

    num_runs = 1
    num_repetions = 1#2
    ex_time = timeit.Timer(forBenchPymia).repeat(
                         repeat=num_repetions,
                         number=num_runs)
    res= median(ex_time)*1000
    print("bench")
    print(res)

if __name__ == '__main__':
    print_hi('PyCharm')

cjdoris commented 2 years ago

I'm no GPU expert, but you should be able to use the cuda array interface (https://numba.pydata.org/numba-doc/dev/cuda/cuda_array_interface.html) to get the pointer to the data.

PythonCall does something similar (https://github.com/cjdoris/PythonCall.jl/blob/main/src/pywrap/PyArray.jl) to wrap python objects that have the numpy array interface.

jakubMitura14 commented 2 years ago

Thanks, so I made progress thanks to your hints I changed it into numba cuda array still it is very slow although function does nothing - I mean theat just passing to argument gets 3633 miliseconds ( I am working on CUDA accelerated segmentation metrics and on data with the same size whole calculation takes around 50 ms )

import numba
import numpy as np
import torch
from statistics import median
import timeit
import julia
from juliacall import Main as jl
from numba import cuda
jl.seval("using Pkg")
jl.seval("""Pkg.add("CUDA")""")
 jl.seval("""Pkg.add("PythonCall")""")
jl.seval("""using CUDA""")
jl.seval("""using PythonCall""")
jl.seval("""CUDA.allowscalar(true)""")
jl.seval("""print(sum(CUDA.ones(3,3,3)))""") #works

jl.seval("""function bb(arrGold)

end""")

def print_hi(name):
    t1 = torch.tensor(np.ones((512,512,800))).to(torch.device("cuda"))
    numbaArray = cuda.as_cuda_array(t1)

    jl.bb(numbaArray)
    def forBenchPymia():
        numba.cuda.synchronize()
        jl.bb(numbaArray)
        numba.cuda.synchronize()

    num_runs = 1
    num_repetions = 1#2
    ex_time = timeit.Timer(forBenchPymia).repeat(
                         repeat=num_repetions,
                         number=num_runs)
    res= median(ex_time)*1000
    print("bench")
    print(res)

    # t = torch.cuda.ByteTensor([2, 22, 222])
    # c = cupy.asarray(t)
    # c_bits = cupy.unpackbits(c)
    # t_bits = torch.as_tensor(c_bits, device="cuda")
    # print(t_bits.view(-1, 8))

# Press the green button in the gutter to run the script.
if __name__ == '__main__':
    print_hi('PyCharm')

rejuvyesh commented 2 years ago

https://github.com/pabloferz/DLPack.jl might be of interest!

jakubMitura14 commented 2 years ago

Thanks !!

github-actions[bot] commented 1 year ago

This issue has been marked as stale because it has been open for 30 days with no activity. If the issue is still relevant then please leave a comment, or else it will be closed in 7 days.

github-actions[bot] commented 1 year ago

This issue has been closed because it has been stale for 7 days. If it is still relevant, please re-open it.

JuliaPy / PythonCall.jl

parse gpu array from python to Julia #93