codeplaysoftware / oneapi-construction-kit

Other
54 stars 20 forks source link

OpenCL object lifetimes #471

Open manuelmohr opened 1 month ago

manuelmohr commented 1 month ago

Version

85dfbf7e8c70e01b37daea0dee20e89ccd7af5f7

What is your question or problem?

While porting an existing OpenCL application that previously ran on the standard Intel OpenCL platform targeting integrated GPUs to now run on an OpenCL platform built using the Construction Kit, I observed an exception.

After reducing the test case, the problem boiled down to this (full test case below):

    cl::Kernel kernel;
    {
        cl::Program program(context, sources);
        program.build({device});
        kernel = cl::Kernel(program, "foo");
    }

    cl::Program program = kernel.getInfo<CL_KERNEL_PROGRAM>();

So this program constructs a kernel object, lets the Program object go out of scope (which will call clReleaseProgram()), and then tries to recover the Program object using a getInfo() call on the kernel object.

This works with the Intel OpenCL platform but fails with the Construction Kit because the clGetKernelInfo call fails. With the Construction Kit, after the Program object goes out of scope, the backing cl_program object will have an internal reference count of 1 (because it's still referenced by the cl_kernel) and an external reference count of 0 (because at that point, no application-visible references to the cl_program exist). Trying to build a Program object again will try to call clRetainProgram(), which will fail because of a special handling for exactly this case.

Obviously, what the program is trying to do here seemed a little bit fishy so I tried to understand what the expected behavior is, which proved to be surprisingly difficult. I found KhronosGroup/OpenCL-Docs#620, which seems to discuss exactly this topic, and if I understood the situation correctly, both behaviors (Intel and Construction Kit) are covered by the current OpenCL spec. The Construction Kit seems to take an approach that isn't explicitly considered in that discussion: it's almost like the "variant (2)" mentioned in the discussion (so it tracks internal references using a separate internal reference counter, but waits for both the application-visible reference counter to hit zero and the internal reference counter to hit zero before going through the process of deleting the object), but then it explicitly disallows "resurrecting" objects from attached objects (i.e., going from internal=1 external=0 to internal=1 external=1 is forbidden).

(Out of interest, I also ran the referenced OpenCL object lifetime validation layer against this program, but it didn't point out a problem, but to be fair, I don't fully understand if it covers this situation. Probably it doesn't.)

I think the test program is incorrect according to the OpenCL specification because for the cl_program object, the number of release calls is equal to the number of retain calls, thus it must be regarded as inaccessible. The fact that this object is still (internally) referenced by the cl_kernel object and thus (technically) "recoverable" isn't important. Do you agree with this?

Was there a specific reason for picking this approach to object lifetimes in the Construction Kit? What are the downsides of allowing "resurrecting" objects?

Full test application:

#define CL_HPP_ENABLE_EXCEPTIONS
#define CL_HPP_TARGET_OPENCL_VERSION 300
#include <CL/opencl.hpp>
#include <iostream>
#include <vector>

int main() try {
    std::vector<cl::Platform> all_platforms;
    cl::Platform::get(&all_platforms);

    if (all_platforms.empty()) {
        std::cerr << "No platforms found.\n";
        return 1;
    }
    cl::Platform platform = all_platforms[0];

    std::vector<cl::Device> all_devices;
    platform.getDevices(CL_DEVICE_TYPE_ALL, &all_devices);
    if (all_devices.empty()) {
        std::cerr << "No devices found.\n";
        return 1;
    }

    cl::Device device = all_devices[0];
    std::cout<< "Using device: " << device.getInfo<CL_DEVICE_NAME>() << "\n";

    cl::Context context({device});

    cl::Program::Sources sources;
    std::string kernel_code = "void kernel foo() {}";
    sources.push_back({kernel_code.c_str(), kernel_code.length()});

    cl::Kernel kernel;
    {
        cl::Program program(context, sources);
        if (program.build({device}) != CL_SUCCESS) {
            std::cerr << "Error building: " << program.getBuildInfo<CL_PROGRAM_BUILD_LOG>(device) << std::endl;
            return 1;
        }

        kernel = cl::Kernel(program, "foo");
    }

    cl::Program program = kernel.getInfo<CL_KERNEL_PROGRAM>();

    return 0;
}
catch (const cl::Error& err) {
    std::cerr << "CL error: " << err.what() << std::endl;
    return 1;
}
hvdijk commented 1 month ago

Hi Manuel,

I think the test program is incorrect according to the OpenCL specification because for the cl_program object, the number of release calls is equal to the number of retain calls, thus it must be regarded as inaccessible. The fact that this object is still (internally) referenced by the cl_kernel object and thus (technically) "recoverable" isn't important. Do you agree with this?

Yes, I agree. The OpenCL spec referenced in the docs issue you referenced matches what you wrote:

The object becomes inaccessible to host code when the number of release operations performed matches the number of retain operations plus the allocation of the object.

It seems reasonable to me to interpret "inaccessible to host code" as "the OpenCL implementation may actively prevent further access by host code".

Was there a specific reason for picking this approach to object lifetimes in the Construction Kit? What are the downsides of allowing "resurrecting" objects?

I looked into the history of this. We used to have a single reference count, but changed it to avoid internal errors if the user calls clRelease*() more often than they are permitted. When we changed it to have a separate internal and external reference count, we did not permit a zero external reference count to become nonzero to avoid threading problems, where thread 1 sees the external reference count is zero, thread 2 increases the external reference count, thread 1 proceeds to check the internal reference count and deletes the object, thread 2 uses the object. There are other ways to prevent this from being an issue, but this is one of the simplest ones.