cogciprocate / ocl

OpenCL for Rust
Other
721 stars 75 forks source link

Documentation - unclear what use_host_slice does/how to use it #205

Closed bddap closed 11 months ago

bddap commented 2 years ago

The docs have a call to action regarding documentation. Much appreciated here's my report:

Regarging the docs on use_host_slice. As someone who is new to opencl, I'm having trouble understanding what it does. I think it allows the caller to own a buffer that is accessible to the gpu but when I try to use it that way I get unexpected results:

code ```rust pub struct OnTheGpu { /// cuda holds references to this objects buffers, the references must /// remain valid for the lifetime of this object so this object cant be moved io: Pin>, kernel: ocl::Kernel, } impl OnTheGpu { fn new(f: F) -> Self where F: FnOnce([Tracked; SRC]) -> [Tracked; DST], { assert_ne!(BATCH, 0, "batch size must not be zero"); assert_ne!(SRC, 0, "input size must not be zero"); assert_ne!(DST, 0, "output size must not be zero"); let prog = Program::new(); let src: [Tracked; SRC] = array_init::array_init(|_| prog.an_input()); let dest = f(src); let kernel_src = prog.tokernel(&dest); let pro_que = ProQue::builder() .src(kernel_src) .dims(BATCH) .build() .unwrap(); let io = Pin::new(Box::new(([[0.0; SRC]; BATCH], [[0.0; DST]; BATCH]))); let src_builder = pro_que.buffer_builder::().len((BATCH, SRC)); let src_builder = unsafe { src_builder.use_host_slice(&io.0.flat()) }; let src_buffer = src_builder.build().unwrap(); let dst_builder = pro_que.buffer_builder::().len((BATCH, DST)); let dst_builder = unsafe { dst_builder.use_host_slice(&io.1.flat()) }; let dst_buffer = dst_builder.build().unwrap(); let kernel = pro_que .kernel_builder("proc") .arg(&src_buffer) .arg(&dst_buffer) .build() .unwrap(); Self { io, kernel } } /// get write accesss to the input buffer pub fn input_mut(&mut self) -> &mut [[f32; SRC]; BATCH] { &mut self.io.0 } /// get read accesss to the input buffer pub fn output(&self) -> &[[f32; DST]; BATCH] { &self.io.1 } // this function mutates the output buffer so it requires a unique reference to self pub fn run(&mut self) { // how to I make sure self.src_buffer is synced here? unsafe { self.kernel.enq().unwrap(); } // how to I make sure self.dst_buffer is synced here? } } ``` When calling `run()` I expect kernel to use the current value of my source slice and the current value of my destination slice, but that's not what happens.

Perhaps I am using use_host_slice totally wrong, but it's hard to tell. An example in the docs would be very helpful.

dmarcuse commented 2 years ago

use_host_slice is a Rust wrapper that sets the CL_MEM_USE_HOST_PTR flag under the hood, which is documented here in the reference manual for the OpenCL C API. The TL;DR is that it asks the OpenCL runtime to use the given memory for the underlying storage of the buffer. What exactly this entails varies depending on the OpenCL runtime and system - if you're using, say, an iGPU which shares memory with the CPU, it may be able to truly use the same memory range under the hood. If you're using a system with a discrete GPU with its own VRAM, it may instead implicitly copy data back and forth when the kernel runs.

bddap commented 2 years ago

That explanation definitely clears things up.