boostorg / compute

A C++ GPU Computing Library for OpenCL
http://boostorg.github.io/compute/
Boost Software License 1.0
1.56k stars 332 forks source link

Correct compute::context::get_devices method #847

Open JablonskiMateusz opened 4 years ago

JablonskiMateusz commented 4 years ago

get a vector of cl_device_id and then use the device ids to populate a vector of compute::device

Signed-off-by: Mateusz Jablonski mateusz.jablonski@intel.com

coveralls commented 4 years ago

Coverage Status

Coverage increased (+0.008%) to 84.027% when pulling 881ca4231de0b85b986b3e6a19949edcdf6340ec on JablonskiMateusz:master into 36c89134d4013b2e5e45bc55656a18bd6141995a on boostorg:master.

JablonskiMateusz commented 4 years ago

@jszuppe could you look at this PR?

kylelutz commented 4 years ago

Hey @JablonskiMateusz! Could you provide some detail on the motivation for this change? Was this causing issues with your OpenCL library?

For context, device is a very thin wrapper on cl_device_id, and they have identical memory layouts. There should be no need to first populate vector<cl_device_id> and then copy into vector<device>.

JablonskiMateusz commented 4 years ago

Hi @kylelutz, the problem is scenario with sub devices. The current implementation is that boost queries number of context devices, then creates a vector of compute::device (default constructor of compute::device is called, without calling clRetainDevice), then it passes the pointer to data of the vector to clGetContextInfo and it fills the devices (also without calling clRetainDevice) and the vector is passed to the user's app. When the app destroys the vector, then the destructor of compute::device checks if it is a sub device and it calls clReleaseDevice. This causes that boost calls clReleaseDevice without calling clRetainDevice before. According to OpenCL spec, it caues UB:

After the device reference count becomes zero and all the objects attached to device (such as
command-queues) are released, the device object is deleted. Using this function to release a
reference that was not obtained by creating the object or by calling clRetainDevice causes
undefined behavior.

IMHO the cl_device_id should be passed to compute::device to its constructor or to assignment operator, currently it is performed in a quite hacky way like memcpy to the memory of compute::device .

My change was inspired by compute::program::get_devices where the vector of compute::device is populated from vector of cl_device_id https://github.com/boostorg/compute/blob/master/include/boost/compute/program.hpp#L191

JablonskiMateusz commented 4 years ago

@kylelutz @jszuppe any comments?

keryell commented 4 years ago

I think that in a real application the compute::context::get_devices should not be the critical path, so having a more correct version does not seem like a problem.

JablonskiMateusz commented 4 years ago

@kylelutz ping

JablonskiMateusz commented 4 years ago

@kylelutz @jszuppe ping

JablonskiMateusz commented 4 years ago

kindly reminder @kylelutz

JablonskiMateusz commented 4 years ago

@kylelutz @jszuppe could you merge this PR?