cogciprocate / ocl

OpenCL for Rust
Other
721 stars 75 forks source link

Struct with #[repr(C)] are not necessarily compatible with OpenCL layouts #210

Closed MoritzKn closed 11 months ago

MoritzKn commented 2 years ago

The docs say that

[OclPrm] Can also be implemented for custom types as long as layout and alignment are conserved between Rust and OpenCL (repr “C”).

I spend quite some time chasing a bug caused by assuming this is always true. I'm unsure if this is due to some edge case in my structs or my driver (M1 MacBook) or if this is always the case.

I have two (assumed to be) identical structs declared in OpenCL and in Rust. The one in OpenCL C has a sizeof 48 and the one in Rust has size_of 36:

struct Light {
  float3 pos;
  float3 color;
  float intensity;
};

// sizeof(struct Light) is 48
use ocl::{prm::Float3, OclPrm};

#[repr(C)]
#[derive(Debug, PartialEq, Default, Clone, Copy)]
pub struct Light {
    pub pos: Float3,
    pub color: Float3,
    pub intensity: f32,
}

unsafe impl OclPrm for Light {}

// std::mem::size_of::<Light>() is 36

This causes all kinds of issues.

Including this error when using Buffer::builder().len(1).fill_val(Light::default()):

Error executing function: clEnqueueFillBuffer

Status error code: CL_INVALID_VALUE (-30)

Proposed Solution

I assume there isn't much this crate can do about this but maybe it's possible to add a note about this to the docs.

My Platform

Platform { Profile: Ok(FULL_PROFILE), Version: Ok(OpenCL 1.2 (Apr 19 2022 18:44:44)), Name: Ok(Apple), Vendor: Ok(Apple), Extensions: Ok(cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event) } { Total Device Count: 1 }
c0gent commented 1 year ago

I'm unable to verify this. Probably something platform specific. Please feel free to submit a pull request with some warning documentation or notes :)

MoritzKn commented 1 year ago

Do you have a hint about how you would describe this issue? I'd like to know more about it but have issues finding anything in the spec or the internet. My gut feeling is that it's caused by Apple compiling the programs to metal and metal having using a different layouting algorithm.

I'd describe it as "struct layout/ABI being inconsistent between OpenCL and C when using the Apple driver". Is that the right way to put it? Do you know if the OpenCL spec specifies how structs should be layouted?

c0gent commented 1 year ago

After taking some time to reread your issue I must apologize, I misspoke before. This is not a platform specific issue. What you've experienced is entirely expected behavior.

OclPrm should not be implemented for arbitrary types. It is intended to be used for types for which you can be absolutely certain will maintain the same structure between host and device, even among different platforms. This is virtually never the case for structs as there is always some variety in size and layout between different languages and platforms.

I'm trying and failing to recall some specific solutions for you at the moment so you may need to do a bit more research yourself but the place to start is by instead of using arrays of structs you'll want to use multiple arrays or parameters. I'm making assumptions about how you're using Light but for example instead of passing a Vec<Light>, you'd have separate vectors for each of the components, pos, color, and intensity and pass them as separate kernel arguments.

Please let me know if that helps and again I'm sorry for leading you in the wrong direction.

MoritzKn commented 1 year ago

Ah okay, thanks for the clarification! The workaround I found before was to force the layout to be the same on both sides by introducing "padding" fields. But I assume that solution isn't necessarily portable. I guess, then I'll just have to pass all the fields in separate arrays. Ugh.

MoritzKn commented 1 year ago

I just had to think about this again... aren't Float3, Float4, etc also just structs? How can you make sure that they are ABI compatible? Is it just because structs with equal size fields are simple enough?

c0gent commented 1 year ago

It's basically just an issue of consistency between host and device. Float3 and other vectors are stored the same way (contiguously, no gaps) no matter what device. Structs have arbitrary layouts with different amounts of padding, etc. depending on the platform. For example, Rust uses a different layout than C even on the same device.

Another aspect of this is due to how many OpenCL devices organize their memory layouts it's usually better to have each kernel parameter in a separate contiguous vector because devices can more easily apply optimization tricks. Graphics cards and other parallel devices do pretty crazy things with how they store and read from memory.