cocoa-xu / evision

Evision: An OpenCV-Erlang/Elixir binding
https://evision.app
Apache License 2.0
322 stars 22 forks source link

adapt to newer CUDA shared pointer format #257

Open cocoa-xu opened 4 days ago

cocoa-xu commented 4 days ago

In the pv-feat/exla-host-ipc branch of elixir-nx/nx there is a newer CUDA shared pointer format.

cocoa-xu commented 3 days ago

It works when mode is either :local or :coda_ipc. And for :host_ipc I can probably make it work today or tomorrow.

iex> cuda_img = Evision.CUDA.GpuMat.gpuMat(Evision.imread("test/testdata/dog.jpg"))
%Evision.CUDA.GpuMat{
  channels: 3,
  type: {:u, 8},
  raw_type: 16,
  shape: {576, 768, 3},
  elemSize: 3,
  ref: #Reference<0.3685598248.2262171712.210856>
}
iex> Evision.CUDA.GpuMat.to_pointer(cuda_img, mode: :local)
{:ok,
 %Evision.IPCHandle.Local{
   handle: 125057068171264,
   step: 2560,
   rows: 576,
   cols: 768,
   channels: 3,
   type: {:u, 8}
 }}
iex> Evision.CUDA.GpuMat.to_pointer(cuda_img, mode: :cuda_ipc)
{:ok,
 %Evision.IPCHandle.CUDA{
   handle: <<160, 47, 3, 20, 190, 113, 0, 0, 160, 211, 0, 0, 0, 0, 0, 0, 0, 128,
     22, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 45, 0,
     0, 0, 0, 0, 0, ...>>,
   step: 2560,
   rows: 576,
   cols: 768,
   channels: 3,
   type: {:u, 8},
   device_id: 0
 }}
cocoa-xu commented 3 days ago

So it basically works.

Test case 1

To pointer and load back as is.

cuda_img = Evision.CUDA.GpuMat.gpuMat(Evision.imread("test/testdata/dog.jpg"))
{:ok, cuda_ptr} = Evision.CUDA.GpuMat.to_pointer(cuda_img, mode: :local)
ptr_img = Evision.CUDA.GpuMat.from_pointer(cuda_ptr)
Evision.Mat.quicklook(Evision.CUDA.GpuMat.download(ptr_img))

Screenshot 2024-06-29 at 21 07 12

Test case 2

It's also possible to work on a submatrix of the original one.

cuda_img = Evision.CUDA.GpuMat.gpuMat(Evision.imread("test/testdata/dog.jpg"))
{:ok, cuda_ptr} = Evision.CUDA.GpuMat.to_pointer(cuda_img, mode: :local)
partial_ptr_img = Evision.CUDA.GpuMat.from_pointer(cuda_ptr, shape: {300, 300, 3})
Evision.Mat.quicklook(Evision.CUDA.GpuMat.download(partial_ptr_img))

Screenshot 2024-06-29 at 21 07 52

Test case 3

To pointer and load in EXLA.

cuda_img = Evision.CUDA.GpuMat.gpuMat(Evision.imread("test/testdata/dog.jpg"))
{:ok, cuda_ptr} = Evision.CUDA.GpuMat.to_pointer(cuda_img, mode: :local)
{:ok, tensor} = Nx.from_pointer({EXLA.Backend, client: :cuda}, cuda_ptr.handle, cuda_img.type, cuda_img.shape)
Evision.Mat.quicklook(Evision.Mat.from_nx_2d(tensor))

Screenshot 2024-06-29 at 21 10 31

Sadly this doesn't work if the original image has paddings between consecutive rows. These paddings were added automatically by OpenCV for memory alignment reason (thus some performance improvements). I'll see if there's any option to load a Mat to GpuMat without any padding, and if there's any penalty doing so.

And the current implementation in EXLA doesn't accept a step parameter -- which can be used to indicate the number of bytes between two consecutive rows. Not quite sure if XLA supports that or not, if supported, then we can achieve zero-copy between evision and EXLA much easier; if not, we need to figure out a way (if possible) to do it with minimal overhead.

/cc @polvalente @davydog187

polvalente commented 2 days ago

I took a cursory look into PjRtBuffer and I didn't find anything related to memory layout offsets, unfortunately. Maybe there's a way to set a tiled layout to work like this.

polvalente commented 2 days ago

One idea, which is not zero-copy, but is at least GPU-to-GPU, is that you could forcibly construct a copy of the tensor without the padding with CUDA calls, and then use the newly allocated tensor as the output. Ideally you'd just disable the OpenCV padding, but this is another way to deal with it.