Open ax3l opened 6 years ago
I have the same problem and not using Docket:
~> ocl_memtest
hostname is guilmon
CL_PLATFORM_NAME: NVIDIA CUDA
CL_PLATFORM_VERSION: OpenCL 1.2 CUDA 10.2.120
Device 0 is CL_DEVICE_TYPE_GPU, "GeForce GTX 950"
allocated 340 Mbytes from device 0
[05/17/2019 15:33:40][guilmon][0]:Test0 [Walking 1 bit]
[05/17/2019 15:33:40][guilmon][0]:Test0: global walk test
ERROR: opencl call failed with rc(-5), line 39, file ocl_tests.cpp
Error: Out of resources
(Does that just mean the test failed?)
@RenaKunisaki We never tested the opencl version of cuda_memtest. Depending of the driver version OpenCL is not able to allocate 100% of the main gpu memory. Could you rerun your your test with cuda_memtest?
Also take care if your X server is running on the same device.
I installed it from Arch package (AUR) and I don't seem to have cuda_memtest
binary. I will try without X running though.
Oh, if you are taking the aur package (here?) it will take the legacy sourceforge version. We haven't seen much activity on that one since years and thus update and fix our own forked CUDA version here.
If you find updates to the OpenCL version we will gladly review and merge pull requests.
cuda_memtest seems to abort with "out of memory" (line 148 in cuda_memtests.cu) when run in a container (nvidia-docker1 and 2) on V100 GPUs.
The problem might be a general one or just triggered in PIConGPU. Needs investigation. Maybe just multiple-times assigned from
mpiInfo
...Occurred with a 4 & 8 GPU PIConGPU lwfa example on a DGX-1.