boostorg / compute

A C++ GPU Computing Library for OpenCL
http://boostorg.github.io/compute/
Boost Software License 1.0
1.55k stars 333 forks source link

Mutiple GPUs #802

Open lovingxiuxiu opened 5 years ago

lovingxiuxiu commented 5 years ago

Is there an example that multiple GPUS are used? The current examples are all using "default_device".

jszuppe commented 5 years ago

https://github.com/boostorg/compute/blob/master/example/list_devices.cpp This examples shows how to list devices from all OpenCL platforms.

https://github.com/boostorg/compute/blob/master/example/random_walk.cpp - If have your devices, for each of them you can create context and queue. If devices are on the same platform you should be able to create one context for both of them (example: https://github.com/boostorg/compute/blob/master/test/test_context.cpp#L53), and then queue for each. If they are not on the same platform, then you need one context per device.

Then you can use multiple devices (GPUs). Remember to pass correct command queue to Boost.Compute's algorithms.

lovingxiuxiu commented 5 years ago

hi,@jszuppe,I have 2 GPUS on the same platform, I have created one context for both of them and one queue for each,created separate kernels and allocated separate device buffers for each device,depart the input data into two parts for each kernel,then excute the two kernels simultaneously. But the result is that two GPUS costs twice as one GPU. Do you have any idea about it?Thank you

jszuppe commented 5 years ago

How big is your input data? Maybe queuing a kernel costs more than execution on a GPU. I'd recommend testing two transform algorithms on large data (for example 128MB of ints). Also make sure that there is no sync between kernel calls (no blocking API call), otherwise they are not run at the same time, but sequentially.

lovingxiuxiu commented 5 years ago

As you said,I found that getting devices,creating context and queue cost more than transferring data and execution on a GPU.And I found that the APIs mentioned cost much more(more than 10 times) on a server with 2 Navidia V100 GPUS than on a laptop with a NVIDIA NVS 5400m GPU.

lovingxiuxiu commented 5 years ago
                                 laptop server

get device(first time) 0.005s 1.9s create context 0.005259s 0.36s create queue 0.000171s 0.000058s

jszuppe commented 5 years ago

I doubt it's a Boost.Compute issue. It's more like NVIDIA OpenCL platform thing.