clMathLibraries / clBLAS

a software library containing BLAS functions written in OpenCL
Apache License 2.0
842 stars 237 forks source link

Default selected GPU is not faster presented in system. #152

Open inferrna opened 9 years ago

inferrna commented 9 years ago

Getting an error in caffe ( https://github.com/BVLC/caffe/pull/2195#issuecomment-148927253 ) I have added some debug output to xgemm.cc and also got info about used device Name: Devastator Vendor: Advanced Micro Devices, Inc. Available: Yes Compute Units: 4 Clock Frequency: 760 mHz Global Memory: 570 mb Max Allocateable Memory: 142 mb Local Memory: 32768 kb

But according to my clinfo https://gist.github.com/inferrna/f183896a683ba773e3b4 I have more powerful device Pitcairn which was not selected. As I found in src/library/blas/AutoGemm/AutoGemmTools/ProfileAutoGemm.cpp

    device = NULL;
    for (i = 0; i < nrDevices; i++) {
        err = clGetDeviceInfo(list[i], CL_DEVICE_NAME,
            sizeof(deviceName), deviceName, NULL);
        CL_CHECK(err);
        assert(err == CL_SUCCESS);
        if ((err == CL_SUCCESS) ) {
            device = list[i];
            break;
        }
    }

it just selects first available device. May be it would be better to select most powerful device, for example by comparing value of compute_units*clock_frequency.

TimmyLiu commented 9 years ago

Actually it is more complicated than simply looking at the peak performance of a card. For example the current implementation runs most efficient on Hawaii architecture with OpenCL 2.0 run time. Anyway you can use the environment variable GPU_DEVICE_ORDINAL to mask you OpenCL device. If you set GPU_DEVICE_ORDINAL=1, device 0 will be masked out.

inferrna commented 9 years ago

Actually it is more complicated than simply looking at the peak performance of a card.

Simply looking at the peak performance of a card is little more complicated than just selecting first GPU in list. Using system variables may be an solution but I can't find GPU_DEVICE_ORDINAL in source code. Is it clBLAS variable or it usable only with AMD driver?