cogciprocate / ocl

OpenCL for Rust
Other
723 stars 75 forks source link

Double free bug causes SIGABRT or SIGSEGV in multi-threaded situations #167

Closed aabizri closed 4 years ago

aabizri commented 4 years ago

EDIT

When testing for the error, I didn't correctly check that the error didn't come from the implementation (I didn't correctly switch to intel-ocl-sdk when I though I did). After trying again I didn't have the error on intel-ocl-sdk so it seems that it is a beignet bug, I will be reporting it there. Closing the issue.

Summary

On ocl 0.19, when trying to build a Context (or a ProQue) in two concurrent threads, a SIGABRT double free or SIGSEGV error is triggered. On a single thread there's no bug.

As OpenCL functions since 1.1 are all thread-safe except for clSetKernelArg(), this is not because this is undefined behavior as per the spec. ~When tested against both beignet and intel-ocl-sdk, I got the same errors, indicating it doesn't come from the particular implementation. It is thus highly probable the error comes from ocl.~

Tested both on

Error & debugging

On SIGABRT these are the messages printed, in decreasing frequency of occurrence:

On SIGSEGV no debug messages are printed. Rarely (one in 20 tries I would say), the sample program doesn't error out.

As the error comes from the memory side, *debugging with MALLOC_CHECK_=1 (or 2) restricts the errors to either SIGSEGV or SIGABRT with free(): invalid pointer as message.

When debugging with GDB, the error always occurred when in ocl-core::retain_context or ocl-core::retain_mem_object.

Reproduction

I have been able to reduce the reproduction to the following code:

extern crate ocl;

pub fn new() {
    ocl::Context::builder().build();
    // Same thing occur with the following line as well (tested with working kernels)
    // ocl::ProQue::builder().src(KERNEL_SRC).build();
}

#[cfg(test)]
mod tests {
    use super::new;

    #[test]
    fn test1() {
        new();
    }

    #[test]
    fn test2() {
        new();
    }
}

Run with cargo test -- --test-threads=2 to trigger the error, and cargo test -- --test-threads=1 to see that it isn't triggered in single-threaded situations.

aabizri commented 4 years ago

When testing for the error, I didn't correctly check that the error didn't come from the implementation (I didn't correctly switch to intel-ocl-sdk when I though I did). After trying again I didn't have the error on intel-ocl-sdk so it seems that it is a beignet bug, I will be reporting it there. Closing the issue.