StreamHPC / gromacs

OpenCL porting of the GROMACS molecular simulation toolkit
http://www.gromacs.org
Other
25 stars 4 forks source link

OpenCL device buffers - improve the way their sizes is computed #17

Closed ancahamuraru closed 9 years ago

ancahamuraru commented 9 years ago

Here's an example: float3* f; // CUDA device buffer cl_mem f; // OpenCL device buffer

Allocating memory: cudaMalloc(&f, count * sizeof(f)); // CUDA f = clCreateBuffer(..., count \ sizeof(float3), ...); // OpenCL

In the case of the OpenCL implementation, the data type of one element is currently hard-coded. This needs to be improved. For example, a new variable could be added to store one element size.

pszi1ard commented 9 years ago

Can't the device buffer be declared as a pointer to a certain data type? In general there is no need to parametrize things like the size of force elements - these always have three components.

ancahamuraru commented 9 years ago

On the host side, an OpenCL device buffer is always of type cl_mem. Which doesn't say much about the type of the elements stored inside that buffer.

Therefore, details about the data type of one element, number of elements, buffer size, all have to be explicitly handled. This can be less or more detailed depending on how the OpenCL device buffer is used.

pszi1ard commented 9 years ago

I see, didn't know that the host-side OpenCL API works only with cl_mem type pointers. There must be some common OpenCL way to handle this inconvenience, I guess. What we should avoid is to increase a lot the size of structs passed by value to the GPU because that could result in kernet startup overhead.

ancahamuraru commented 9 years ago

That's mostly an issue on the host side and refers to memory allocations and data transfers. It does not affect the data structures passed to the kernel.

Unlike the CUDA implementation, the data structures used by the OpenCL host code and device code are different. The data structures passed to the device hold only the information needed by the kernel and the pointers are always sent separately (https://github.com/StreamComputing/gromacs/wiki/A1.2-log).

ancahamuraru commented 9 years ago

Done - see b13e526794ce5ae3a8e3d6755c2f92cdbe9a2d59