Open traktofon opened 8 years ago
@frank-otto I have been able to reproduce the OpenCL issue but not the unified backend. Can you double check if unified still fails? Could it potentially be someone else was using the GPU simultaneously?
Verified the following stand alone code fails with nvidia-cuda-mps-control -d
running
#define __CL_ENABLE_EXCEPTIONS
#include "cl.hpp"
#include <vector>
#include <iostream>
#include <iterator>
#include <algorithm>
using namespace cl;
using namespace std;
int main(int argc, char* argv[]) {
Context(CL_DEVICE_TYPE_GPU);
static const unsigned elements = 1000;
vector<float> data(elements, 5);
Buffer a(begin(data), end(data), true, false);
Buffer b(begin(data), end(data), true, false);
Buffer c(CL_MEM_READ_WRITE, elements * sizeof(float));
Program addProg(R"d(
kernel
void add( global const float * restrict const a,
global const float * restrict const b,
global float * restrict const c) {
unsigned idx = get_global_id(0);
c[idx] = a[idx] + b[idx];
}
)d", true);
auto add = make_kernel<Buffer, Buffer, Buffer>(addProg, "add");
add(EnqueueArgs(elements), a, b, c);
vector<float> result(elements);
cl::copy(c, begin(result), end(result));
std::copy(begin(result), end(result), ostream_iterator<float>(cout, ", "));
}
Able to reproduce the same problem using this stand alone code.
#define __CL_ENABLE_EXCEPTIONS
#include "cl.hpp"
#include <vector>
#include <iostream>
#include <iterator>
#include <algorithm>
using namespace cl;
using namespace std;
int main(int argc, char* argv[])
{
std::vector<cl::Platform> platforms;
cl::Platform::get(&platforms);
for (auto platform : platforms) {
cl_context_properties cps[3] = {CL_CONTEXT_PLATFORM,
(cl_context_properties)(platform()),
0};
std::vector<cl::Device> devices;
try {
platform.getDevices(CL_DEVICE_TYPE_GPU, &devices);
} catch(...) {
continue;
}
std::cout << platform.getInfo<CL_PLATFORM_NAME>() << std::endl;
cl::Context Context = cl::Context(devices[0], cps);
static const unsigned elements = 1000;
vector<float> data(elements, 5);
Buffer a(begin(data), end(data), true, false);
Buffer b(begin(data), end(data), true, false);
Buffer c(CL_MEM_READ_WRITE, elements * sizeof(float));
Program addProg(R"d(
kernel
void add( global const float * restrict const a,
global const float * restrict const b,
global float * restrict const c) {
unsigned idx = get_global_id(0);
c[idx] = a[idx] + b[idx];
}
)d", true);
auto add = make_kernel<Buffer, Buffer, Buffer>(addProg, "add");
add(EnqueueArgs(elements), a, b, c);
vector<float> result(elements);
cl::copy(c, begin(result), end(result));
std::copy(begin(result), end(result), ostream_iterator<float>(cout, ", "));
std::cout << std::endl;
}
}
After having looked into cl.hpp
and cl2.hpp
, the problem seems to be the usage of clCreateContext
(in failure case) vs clCreateContextFromType
(in working case). This is most likely an NVIDIA bug that I am skeptical that they will fix.
We could change our device manager to use clCreateContextFromType
, but that would create OpenCL contexts with more than one device which is different from one to one mapping between devices and contexts we have right now.
Changing the context creation definitely causes problems on OSX, so we could do it optionally do it for Linux only, but I am bit wary of doing this.
@arrayfire/core-devel thoughts?
Not relevant anymore
Nevermind, I was testing this incorrectly. Even the first C++ code segment I posted is failing now.
@frank-otto can you test the first stand alone code snippet on your machine ? This could potentially be NVIDIA blocking all non CUDA applications running when this daemon is enabled.
@pavanky, thanks for looking into this issue.
Sorry for the delay, the GPU machine was busy and I couldn't run tests in the previous days. Now I had a chance to test the snippet you posted. I compiled with:
g++ -std=c++11 -Wall -o test.x -I/usr/include/CL test.cc -lOpenCL
The results with MPS running are:
$ ./test.x terminate called after throwing an instance of 'cl::Error' what(): clCreateContextFromType Aborted
And without MPS:
$ ./test.x 10, 10, 10, 10, 10, 10, 10, [...]
As for the unified backend, it still fails for me with the "out of memory" error on trying to use the CUDA backend dynamically. I am certain that the GPU was otherwise idle.
Some additional notes on the system:
@frank-otto Ok, it looks like there's nothing we can do about the OpenCL backend, because it seems to be failing in a stand alone application independent of arrayfire.
As for the unified backend, this is weird. Can I get an output of af::info()
with MPS disabled?
Also an output of nvidia-smi
and nvidia-smi -a
when MPS is failing would be good.
@pavanky, this is the code I use to get af::info output:
#include <arrayfire.h>
#include <cstdio>
#include <cstdlib>
using namespace af;
int main(int argc, char *argv[])
{
try {
// Select a device and display arrayfire info
int device = argc > 1 ? atoi(argv[1]) : 0;
af::setDevice(device);
af::info();
} catch (af::exception& e) {
fprintf(stderr, "%s\n", e.what());
throw;
}
return 0;
}
And I compile it with:
g++ -Wall -std=c++11 -o afinfo.x -I$AF_PATH/include afinfo.cc -L$AF_PATH/lib -laf
With MPS running, the output is:
$ ./afinfo.x ArrayFire Exception (Device out of memory:101): In function cuda::DeviceManager::DeviceManager() In file src/backend/cuda/platform.cpp:359 CUDA Error (2): out of memory
In function void af::setDevice(int) In file src/api/cpp/device.cpp:91 terminate called after throwing an instance of 'af::exception' what(): ArrayFire Exception (Device out of memory:101): In function cuda::DeviceManager::DeviceManager() In file src/backend/cuda/platform.cpp:359 CUDA Error (2): out of memory
In function void af::setDevice(int) In file src/api/cpp/device.cpp:91 Aborted
Without MPS running, the output is:
$ ./afinfo.x ArrayFire v3.3.2 (CUDA, 64-bit Linux, build default) Platform: CUDA Toolkit 7.5, Driver: 352.63 [0] Tesla K20c, 4800 MB, CUDA Compute 3.5 -1- Tesla K20c, 4800 MB, CUDA Compute 3.5 -2- Tesla K20c, 4800 MB, CUDA Compute 3.5
Without MPS, the output of nvidia-smi
is:
Mon Jun 20 18:53:30 2016
+------------------------------------------------------+
| NVIDIA-SMI 352.63 Driver Version: 352.63 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K20c Off | 0000:02:00.0 Off | 0 |
| 30% 41C P0 48W / 225W | 12MiB / 4799MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K20c Off | 0000:42:00.0 Off | 0 |
| 32% 44C P0 50W / 225W | 12MiB / 4799MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K20c Off | 0000:43:00.0 Off | 0 |
| 30% 41C P0 53W / 225W | 12MiB / 4799MiB | 87% E. Process |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
And output of nvidia-smi -a
:
==============NVSMI LOG==============
Timestamp : Mon Jun 20 18:54:08 2016
Driver Version : 352.63
Attached GPUs : 3
GPU 0000:02:00.0
Product Name : Tesla K20c
Product Brand : Tesla
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 1920
Driver Model
Current : N/A
Pending : N/A
Serial Number : 0322714059579
GPU UUID : GPU-3efe11b6-d282-11c7-daad-5773e5b99064
Minor Number : 0
VBIOS Version : 80.10.39.00.06
MultiGPU Board : No
Board ID : 0x200
Inforom Version
Image Version : 2081.0204.00.07
OEM Object : 1.1
ECC Object : 3.0
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
PCI
Bus : 0x02
Device : 0x00
Domain : 0x0000
Device Id : 0x102210DE
Bus Id : 0000:02:00.0
Sub System Id : 0x098210DE
GPU Link Info
PCIe Generation
Max : 2
Current : 2
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays since reset : 0
Tx Throughput : N/A
Rx Throughput : N/A
Fan Speed : 30 %
Performance State : P0
Clocks Throttle Reasons
Idle : Not Active
Applications Clocks Setting : Active
SW Power Cap : Not Active
HW Slowdown : Not Active
Unknown : Not Active
FB Memory Usage
Total : 4799 MiB
Used : 12 MiB
Free : 4787 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 2 MiB
Free : 254 MiB
Compute Mode : Exclusive_Process
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Ecc Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Aggregate
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Retired Pages
Single Bit ECC : 0
Double Bit ECC : 0
Pending : No
Temperature
GPU Current Temp : 41 C
GPU Shutdown Temp : 95 C
GPU Slowdown Temp : 90 C
Power Readings
Power Management : Supported
Power Draw : 48.82 W
Power Limit : 225.00 W
Default Power Limit : 225.00 W
Enforced Power Limit : 225.00 W
Min Power Limit : 150.00 W
Max Power Limit : 225.00 W
Clocks
Graphics : 705 MHz
SM : 705 MHz
Memory : 2600 MHz
Applications Clocks
Graphics : 705 MHz
Memory : 2600 MHz
Default Applications Clocks
Graphics : 705 MHz
Memory : 2600 MHz
Max Clocks
Graphics : 758 MHz
SM : 758 MHz
Memory : 2600 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes : None
GPU 0000:42:00.0
Product Name : Tesla K20c
Product Brand : Tesla
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 1920
Driver Model
Current : N/A
Pending : N/A
Serial Number : 0322714059051
GPU UUID : GPU-54fd78b1-6231-81d7-7524-0ff0138c8f14
Minor Number : 1
VBIOS Version : 80.10.39.00.06
MultiGPU Board : No
Board ID : 0x4200
Inforom Version
Image Version : 2081.0204.00.07
OEM Object : 1.1
ECC Object : 3.0
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
PCI
Bus : 0x42
Device : 0x00
Domain : 0x0000
Device Id : 0x102210DE
Bus Id : 0000:42:00.0
Sub System Id : 0x098210DE
GPU Link Info
PCIe Generation
Max : 2
Current : 2
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays since reset : 0
Tx Throughput : N/A
Rx Throughput : N/A
Fan Speed : 33 %
Performance State : P0
Clocks Throttle Reasons
Idle : Not Active
Applications Clocks Setting : Active
SW Power Cap : Not Active
HW Slowdown : Not Active
Unknown : Not Active
FB Memory Usage
Total : 4799 MiB
Used : 12 MiB
Free : 4787 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 2 MiB
Free : 254 MiB
Compute Mode : Exclusive_Process
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Ecc Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Aggregate
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Retired Pages
Single Bit ECC : 0
Double Bit ECC : 0
Pending : No
Temperature
GPU Current Temp : 44 C
GPU Shutdown Temp : 95 C
GPU Slowdown Temp : 90 C
Power Readings
Power Management : Supported
Power Draw : 50.28 W
Power Limit : 225.00 W
Default Power Limit : 225.00 W
Enforced Power Limit : 225.00 W
Min Power Limit : 150.00 W
Max Power Limit : 225.00 W
Clocks
Graphics : 705 MHz
SM : 705 MHz
Memory : 2600 MHz
Applications Clocks
Graphics : 705 MHz
Memory : 2600 MHz
Default Applications Clocks
Graphics : 705 MHz
Memory : 2600 MHz
Max Clocks
Graphics : 758 MHz
SM : 758 MHz
Memory : 2600 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes : None
GPU 0000:43:00.0
Product Name : Tesla K20c
Product Brand : Tesla
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 1920
Driver Model
Current : N/A
Pending : N/A
Serial Number : 0322714059533
GPU UUID : GPU-913c0a7d-59c2-bee7-cc36-b1e11aeab7be
Minor Number : 2
VBIOS Version : 80.10.39.00.06
MultiGPU Board : No
Board ID : 0x4300
Inforom Version
Image Version : 2081.0204.00.07
OEM Object : 1.1
ECC Object : 3.0
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
PCI
Bus : 0x43
Device : 0x00
Domain : 0x0000
Device Id : 0x102210DE
Bus Id : 0000:43:00.0
Sub System Id : 0x098210DE
GPU Link Info
PCIe Generation
Max : 2
Current : 2
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays since reset : 0
Tx Throughput : N/A
Rx Throughput : N/A
Fan Speed : 31 %
Performance State : P0
Clocks Throttle Reasons
Idle : Not Active
Applications Clocks Setting : Active
SW Power Cap : Not Active
HW Slowdown : Not Active
Unknown : Not Active
FB Memory Usage
Total : 4799 MiB
Used : 12 MiB
Free : 4787 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 2 MiB
Free : 254 MiB
Compute Mode : Exclusive_Process
Utilization
Gpu : 97 %
Memory : 6 %
Encoder : 0 %
Decoder : 0 %
Ecc Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Aggregate
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Retired Pages
Single Bit ECC : 0
Double Bit ECC : 0
Pending : No
Temperature
GPU Current Temp : 42 C
GPU Shutdown Temp : 95 C
GPU Slowdown Temp : 90 C
Power Readings
Power Management : Supported
Power Draw : 51.43 W
Power Limit : 225.00 W
Default Power Limit : 225.00 W
Enforced Power Limit : 225.00 W
Min Power Limit : 150.00 W
Max Power Limit : 225.00 W
Clocks
Graphics : 705 MHz
SM : 705 MHz
Memory : 2600 MHz
Applications Clocks
Graphics : 705 MHz
Memory : 2600 MHz
Default Applications Clocks
Graphics : 705 MHz
Memory : 2600 MHz
Max Clocks
Graphics : 758 MHz
SM : 758 MHz
Memory : 2600 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes : None
With MPS running, nvidia-smi
ouputs:
Mon Jun 20 18:55:07 2016
+------------------------------------------------------+
| NVIDIA-SMI 352.63 Driver Version: 352.63 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K20c Off | 0000:02:00.0 Off | 0 |
| 31% 42C P8 16W / 225W | 105MiB / 4799MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K20c Off | 0000:42:00.0 Off | 0 |
| 34% 45C P8 16W / 225W | 105MiB / 4799MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K20c Off | 0000:43:00.0 Off | 0 |
| 31% 43C P0 48W / 225W | 105MiB / 4799MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 15639 C nvidia-cuda-mps-server 89MiB |
| 1 15639 C nvidia-cuda-mps-server 89MiB |
| 2 15639 C nvidia-cuda-mps-server 89MiB |
+-----------------------------------------------------------------------------+
Note: the nvidia-cuda-mps-server only shows up after a CUDA application was run (the failing afinfo.x is sufficient). And nvidia-smi -a
outputs:
==============NVSMI LOG==============
Timestamp : Mon Jun 20 18:57:02 2016
Driver Version : 352.63
Attached GPUs : 3
GPU 0000:02:00.0
Product Name : Tesla K20c
Product Brand : Tesla
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 1920
Driver Model
Current : N/A
Pending : N/A
Serial Number : 0322714059579
GPU UUID : GPU-3efe11b6-d282-11c7-daad-5773e5b99064
Minor Number : 0
VBIOS Version : 80.10.39.00.06
MultiGPU Board : No
Board ID : 0x200
Inforom Version
Image Version : 2081.0204.00.07
OEM Object : 1.1
ECC Object : 3.0
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
PCI
Bus : 0x02
Device : 0x00
Domain : 0x0000
Device Id : 0x102210DE
Bus Id : 0000:02:00.0
Sub System Id : 0x098210DE
GPU Link Info
PCIe Generation
Max : 2
Current : 1
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays since reset : 0
Tx Throughput : N/A
Rx Throughput : N/A
Fan Speed : 30 %
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
Unknown : Not Active
FB Memory Usage
Total : 4799 MiB
Used : 105 MiB
Free : 4694 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 14 MiB
Free : 242 MiB
Compute Mode : Exclusive_Process
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Ecc Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Aggregate
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Retired Pages
Single Bit ECC : 0
Double Bit ECC : 0
Pending : No
Temperature
GPU Current Temp : 39 C
GPU Shutdown Temp : 95 C
GPU Slowdown Temp : 90 C
Power Readings
Power Management : Supported
Power Draw : 16.41 W
Power Limit : 225.00 W
Default Power Limit : 225.00 W
Enforced Power Limit : 225.00 W
Min Power Limit : 150.00 W
Max Power Limit : 225.00 W
Clocks
Graphics : 324 MHz
SM : 324 MHz
Memory : 324 MHz
Applications Clocks
Graphics : 705 MHz
Memory : 2600 MHz
Default Applications Clocks
Graphics : 705 MHz
Memory : 2600 MHz
Max Clocks
Graphics : 758 MHz
SM : 758 MHz
Memory : 2600 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes
Process ID : 15639
Type : C
Name : nvidia-cuda-mps-server
Used GPU Memory : 89 MiB
GPU 0000:42:00.0
Product Name : Tesla K20c
Product Brand : Tesla
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 1920
Driver Model
Current : N/A
Pending : N/A
Serial Number : 0322714059051
GPU UUID : GPU-54fd78b1-6231-81d7-7524-0ff0138c8f14
Minor Number : 1
VBIOS Version : 80.10.39.00.06
MultiGPU Board : No
Board ID : 0x4200
Inforom Version
Image Version : 2081.0204.00.07
OEM Object : 1.1
ECC Object : 3.0
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
PCI
Bus : 0x42
Device : 0x00
Domain : 0x0000
Device Id : 0x102210DE
Bus Id : 0000:42:00.0
Sub System Id : 0x098210DE
GPU Link Info
PCIe Generation
Max : 2
Current : 1
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays since reset : 0
Tx Throughput : N/A
Rx Throughput : N/A
Fan Speed : 30 %
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
Unknown : Not Active
FB Memory Usage
Total : 4799 MiB
Used : 105 MiB
Free : 4694 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 14 MiB
Free : 242 MiB
Compute Mode : Exclusive_Process
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Ecc Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Aggregate
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Retired Pages
Single Bit ECC : 0
Double Bit ECC : 0
Pending : No
Temperature
GPU Current Temp : 41 C
GPU Shutdown Temp : 95 C
GPU Slowdown Temp : 90 C
Power Readings
Power Management : Supported
Power Draw : 16.61 W
Power Limit : 225.00 W
Default Power Limit : 225.00 W
Enforced Power Limit : 225.00 W
Min Power Limit : 150.00 W
Max Power Limit : 225.00 W
Clocks
Graphics : 324 MHz
SM : 324 MHz
Memory : 324 MHz
Applications Clocks
Graphics : 705 MHz
Memory : 2600 MHz
Default Applications Clocks
Graphics : 705 MHz
Memory : 2600 MHz
Max Clocks
Graphics : 758 MHz
SM : 758 MHz
Memory : 2600 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes
Process ID : 15639
Type : C
Name : nvidia-cuda-mps-server
Used GPU Memory : 89 MiB
GPU 0000:43:00.0
Product Name : Tesla K20c
Product Brand : Tesla
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 1920
Driver Model
Current : N/A
Pending : N/A
Serial Number : 0322714059533
GPU UUID : GPU-913c0a7d-59c2-bee7-cc36-b1e11aeab7be
Minor Number : 2
VBIOS Version : 80.10.39.00.06
MultiGPU Board : No
Board ID : 0x4300
Inforom Version
Image Version : 2081.0204.00.07
OEM Object : 1.1
ECC Object : 3.0
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
PCI
Bus : 0x43
Device : 0x00
Domain : 0x0000
Device Id : 0x102210DE
Bus Id : 0000:43:00.0
Sub System Id : 0x098210DE
GPU Link Info
PCIe Generation
Max : 2
Current : 1
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays since reset : 0
Tx Throughput : N/A
Rx Throughput : N/A
Fan Speed : 30 %
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
Unknown : Not Active
FB Memory Usage
Total : 4799 MiB
Used : 105 MiB
Free : 4694 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 14 MiB
Free : 242 MiB
Compute Mode : Exclusive_Process
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Ecc Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Aggregate
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Retired Pages
Single Bit ECC : 0
Double Bit ECC : 0
Pending : No
Temperature
GPU Current Temp : 39 C
GPU Shutdown Temp : 95 C
GPU Slowdown Temp : 90 C
Power Readings
Power Management : Supported
Power Draw : 16.80 W
Power Limit : 225.00 W
Default Power Limit : 225.00 W
Enforced Power Limit : 225.00 W
Min Power Limit : 150.00 W
Max Power Limit : 225.00 W
Clocks
Graphics : 324 MHz
SM : 324 MHz
Memory : 324 MHz
Applications Clocks
Graphics : 705 MHz
Memory : 2600 MHz
Default Applications Clocks
Graphics : 705 MHz
Memory : 2600 MHz
Max Clocks
Graphics : 758 MHz
SM : 758 MHz
Memory : 2600 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes
Process ID : 15639
Type : C
Name : nvidia-cuda-mps-server
Used GPU Memory : 89 MiB
NOTE: The GPUs are set to compute mode EXCLUSIVE_PROCESS as that is recommended by Nvidia for when MPS is used. I did also test with the DEFAULT compute mode, but the results are the same (i.e. unified backend and opencl backend failing as reported above).
Thanks!
Hi, when the NVIDIA CUDA Multi Process Service (MPS) is active, I encounter two problems: a) the opencl backend doesn't work b) the unified backend cannot invoke the cuda backend
To reproduce, run "nvidia-cuda-mps-control -d" as root, then test with the "examples/helloworld":
If the cuda-mps service is not running, then all four backends work properly.
I also encountered problems with ArrayFire.jl, where even the cpu and cuda backends don't work if cuda-mps is running. Without cuda-mps, the backends work fine.
In theory, whether cuda-mps is running or not should be completely transparent to CUDA applications. On multi-user systems, and for MPI-parallelized programs, cuda-mps is beneficial, so it would be nice if ArrayFire could work properly with MPS.
Tested with ArrayFire-3.3.2, both binary distribution and compiled from source. CUDA version is 7.5. Nvidia driver is 352.63.
Regards, Frank