clEsperanto / clesperantoj_prototype

A Java Wrapper around CLIc / clesperanto
4 stars 3 forks source link

Referencing GPU byte arrays #36

Open carlosuc3m opened 3 months ago

carlosuc3m commented 3 months ago

Hello again @StRigaud and sorry if this is already too many issues 😂

I am creating a separate issue of what I mention here: https://github.com/clEsperanto/clesperantoj_prototype/issues/8#issuecomment-2182523288

I think it could be great to have a ClesperantoJ method that references the underlying byte array of an ArrayJ. With this it would be possible to decrease the number of copies when creating an ArrayJ from and ImgLib2 image.

In addition we could directly modify pixels on the GPU (is this wanted?)

I also think that it could add some problems. For example if an ArrayJ is referenced by an ImgLib2, this would never leave the GPu unless the user makes a copy of it on the CPU and dereferences the original GPU backed array.

So even though I think that maybe we should consider a little bit more whether to offer the possibility or not of GPU backed ImgLib2 arrays, in my opinion it would still be interesting that the referenced byte array of the GPU can be accessed from Java

StRigaud commented 3 months ago

So, the memory layout is a such

We do not have access to the GPU memory, when creating a memory array on the GPU, we need to use a low-level operation like clEnqueueWriteBufferRect. This function will take a memory pointer from the CPU, as well as its size, row and slice pitch and element size, and copy the memory CPU to the memory GPU, returning to you a cl_mem pointer identifying you buffer on the GPU.

This cl_mem is what it is stored inside the Array in C++ and by extention to the ArrayJ.

When calling the ArrayJ.write() function, you actually call clEnqueueWriteBufferRect by a set of sub function in the C++ code below.

This CPU to GPU copy is mandatory (kind of, but that's an other complexe discussion), and it need a CPU memory pointer (float*) as input.

The best implenentation we can have is to be able to extract from an imglib2 image its memory pointer and pass it without copy or extra work to the write method (same in reverse for the read). Any extra processing in between (copy, memory rearrangement, conversion, etc.) will be extra processing time.

Now, I am all in favor of the KISS approach. Lets first focus on something that work and build and improve from there.