Open joaquimg opened 9 years ago
Thanks for reporting this issue, tomorrow I'll take a look at possible causes. Note, though, that my GPU-enabled computer is currently in repair, so I'll be able to test solution only in 4-5 days. I'll try to generate some useful ideas earlier.
I have a windows and a mac both with enabled GPU, so you can count on me to test whatever you want.
Can you please try 2 following snippets:
# 1st
d_a = curand(Float64, 1000)
a = to_host(d_a)
sleep(5)
result = devices(dev->capability(dev)[1]>=2) do devlist
end
and
# 2nd
d_a = curand(Float64, 1000)
a = to_host(d_a)
device_reset()
result = devices(dev->capability(dev)[1]>=2) do devlist
end
My current guess is that curand
uses default device initialized by call to cudaSetDevice
(device()
in Julia code), while device() do
block either resets them before curand
finishes or simply overrides default device while CURAND.jl still uses it.
In fact, the problem may be in default random number generator that may reference device that has already been reset. Can you replace:
d_a = curand(Float64, 1000)
with
rng = create_generator()
d_a = curand(rng)
destroy_generator(rng)
?
Just tested, #1 and #2 as firstly described, none works on the second ran.
about:
#1
rng = create_generator()
d_a = curand(rng,Float64, 1000)
destroy_generator(rng)
a = to_host(d_a)
sleep(5)
result = devices(dev->capability(dev)[1]>=2) do devlist
end
and
#2
rng = create_generator()
d_a = curand(rng,Float64, 1000)
destroy_generator(rng)
a = to_host(d_a)
device_reset()
result = devices(dev->capability(dev)[1]>=2) do devlist
end
both return this different error: in terminal:
WARNING: cuRAND operation failed with status: CURAND_STATUS_INITIALIZATION_FAILED (203)
in statuscheck at /Users/joaquimdiasgarcia/.julia/v0.4/CURAND/src/wrappers.jl:28
in create_generator at /Users/joaquimdiasgarcia/.julia/v0.4/CURAND/src/wrappers.jl:36
in create_generator at /Users/joaquimdiasgarcia/.julia/v0.4/CURAND/src/wrappers.jl:35
ERROR: "cuRAND operation failed with status: CURAND_STATUS_INITIALIZATION_FAILED"
in statuscheck at /Users/joaquimdiasgarcia/.julia/v0.4/CURAND/src/wrappers.jl:29
in create_generator at /Users/joaquimdiasgarcia/.julia/v0.4/CURAND/src/wrappers.jl:36
in create_generator at /Users/joaquimdiasgarcia/.julia/v0.4/CURAND/src/wrappers.jl:35
in IJulia
WARNING: cuRAND operation failed with status: CURAND_STATUS_INITIALIZATION_FAILED (203)
in
LoadError: "cuRAND operation failed with status: CURAND_STATUS_INITIALIZATION_FAILED"
while loading In[3], in expression starting on line 1
in statuscheck at /Users/joaquimdiasgarcia/.julia/v0.4/CURAND/src/wrappers.jl:29
in create_generator at /Users/joaquimdiasgarcia/.julia/v0.4/CURAND/src/wrappers.jl:36
in create_generator at /Users/joaquimdiasgarcia/.julia/v0.4/CURAND/src/wrappers.jl:35
statuscheck at /Users/joaquimdiasgarcia/.julia/v0.4/CURAND/src/wrappers.jl:28
in create_generator at /Users/joaquimdiasgarcia/.julia/v0.4/CURAND/src/wrappers.jl:36
in create_generator at /Users/joaquimdiasgarcia/.julia/v0.4/CURAND/src/wrappers.jl:35
in include_string at loading.jl:266
in execute_request_0x535c5df2 at /Users/joaquimdiasgarcia/.julia/v0.4/IJulia/src/execute_request.jl:177
in eventloop at /Users/joaquimdiasgarcia/.julia/v0.4/IJulia/src/IJulia.jl:141
in anonymous at task.jl:447
Forgot to ask: how many devices do you have? I.e. what is the length of devlist
?
Only 1 device! The list only contains the number zero.
Ok, one more experiment. Please, enter the following and check a result:
using CUDArt
device()
device()
device(1)
device(1)
device()
reset_device()
reset_device()
device()
device(1)
device()
If I understand it correctly, at least some of these pairs should give an error, but better to make sure.
well,
device(1)
gives an error anyway since the only device I have is device(0)
also reset_device()
gives error because the command is actually device_reset()
finally,
using CUDArt
device()
device()
device(0)
device(0)
device()
device_reset()
device_reset()
device()
device(0)
device()
gives no error
Ah, sorry for providing incorrect commands. Today I should get back my laptop, so I will be able to make further experiments myself.
It becomes curiouser and curiouser. I modified your example just a little bit and the error seems to be gone:
using CUDArt
using CURAND
d_a = curand(Float64, 1000)
a = to_host(d_a)
result = devices() do devlist
end
Note that this time call to devices()
doesn't take criteria lambda function. I tried calls to curand()
, to_host
and devices()
in all possible combinations, but couldn't make it fail.
Exact criteria function doesn't matter, even stub function dev->true
results in an error you described. I'm investigating it.
Well, it makes sense, because devices() do .. end
actually calls a function that simply filters device list, e.g. function in do ... end
refers to a criteria function, not a function we want to apply.
However, I was able to get some more interesting error message:
using CUDArt
using CURAND
d_a = curand(Float64, 1000)
a = to_host(d_a)
# stripped out version of `devices(dev->true) do devlist end`
devlist = [0]
CUDArt.init(devlist)
CUDArt.close(devlist)
d_a = curand(Float64, 1000)
a = to_host(d_a)
Last call results in the following stack trace:
!julia> a = to_host(d_a)
WARNING: CUDA error triggered from:
in checkerror at /home/slipslop/.julia/v0.4/CUDArt/src/libcudart-6.5.jl:15
in copy! at /home/slipslop/.julia/v0.4/CUDArt/src/arrays.jl:152
in to_host at /home/slipslop/.julia/v0.4/CUDArt/src/arrays.jl:87ERROR: "an illegal memory access was encountered"
in checkerror at /home/slipslop/.julia/v0.4/CUDArt/src/libcudart-6.5.jl:16
in copy! at /home/slipslop/.julia/v0.4/CUDArt/src/arrays.jl:152
in to_host at /home/slipslop/.julia/v0.4/CUDArt/src/arrays.jl:87
Some more updates. I can reproduce this error in pure C:
#include <stdio.h>
#include <stdlib.h>
#include <cuda.h>
#include <curand.h>
#define CUDA_CALL(expr) { \
int status = (expr); \
if (status != cudaSuccess) { \
printf("Error at %s:%d -- %d\n", __FILE__, __LINE__, status); \
exit(EXIT_FAILURE); \
} \
}
#define CURAND_CALL(expr) { \
int status = (expr); \
if (status != CURAND_STATUS_SUCCESS) { \
printf("Error at %s:%d -- %d\n", __FILE__, __LINE__, status); \
exit(EXIT_FAILURE); \
} \
}
int device()
{
int dev;
CUDA_CALL(cudaGetDevice(&dev))
return dev;
}
void device_reset()
{
CUDA_CALL(cudaDeviceReset());
}
void print_array(float* a, int n)
{
int i;
for(i = 0; i < n; i++) {
printf("%1.4f ", a[i]);
}
printf("\n");
}
int main(int argc, char *argv[])
{
size_t n = 10;
int dev;
float *A;
float *d_A;
curandGenerator_t gen;
// buffer allocation
A = (float *)calloc(n, sizeof(float));
CUDA_CALL(cudaMalloc((void **)&d_A, n * sizeof(float)));
// CURAND
CURAND_CALL(curandCreateGenerator(&gen, CURAND_RNG_PSEUDO_DEFAULT));
CURAND_CALL(curandGenerateUniform(gen, d_A, n));
CUDA_CALL(cudaMemcpy(A, d_A, n * sizeof(float), cudaMemcpyDeviceToHost));
print_array(A, n);
// CUDArt.init()
dev = device();
printf("Using device #%d\n", dev);
// CUDArt.close()
device_reset();
// CURAND again
// CUDA_CALL(cudaMalloc((void **)&d_A, n * sizeof(float))); -- reallocating d_A doesn't change the result
CURAND_CALL(curandGenerateUniform(gen, d_A, n));
CUDA_CALL(cudaMemcpy(A, d_A, n * sizeof(float), cudaMemcpyDeviceToHost));
print_array(A, n);
return EXIT_SUCCESS;
}
Compiling and running it gives:
$ nvcc curand_test.cu -lcurand -o test.out && ./test.out
0.7402 0.9210 0.0390 0.9690 0.9251 0.4464 0.6673 0.1099 0.4702 0.5132
Using device #0
Error at curand_test.cu:73 -- 77
Where 77 stands for cudaErrorIllegalAddress
, so exactly the same error.
If, however, we move device_reset()
to the end, everything works smoothly. So I have particular doubts cuRAND can actually "survive" after cudaDeviceReset()
. At the moment I would suggest for any program using CUDArt.jl and CURAND.jl simply not to use devices() do ... end
, but instead manually initializing CUDA with CUDArt.init()
and finalizing it at the end with CUDArt.close()
.
My bad, in the C code I provided I didn't recreate RNG. With recreated RNG the code works.
But what's interesting, Julia code from one of previous examples works fine for me too!
using CUDArt
using CURAND
rng = create_generator()
d_a = curand(rng,Float64, 1000)
destroy_generator(rng)
a = to_host(d_a)
device_reset()
result = devices(dev->capability(dev)[1]>=2) do devlist
end
I can run this code whatever number of times in REPL without any error. @joaquimg Could you please open a fresh Julia session (without any previous commands) and try it out?
Looks like after CUDA runtime have got memory error, it gets into inconsistent state, and any further operations - including reallocation of memory and recreation of RNG - start behaving incorrectly, producing strange and illogical errors. So any testing should always be done in a fresh session - once an error occurred, all further results are compromised.
I started the REPL and right in the second run I got the error:
WARNING: cuRAND operation failed with status: CURAND_STATUS_INITIALIZATION_FAILED (203)
in statuscheck at C:\Users\joaquimgarcia.julia\v0.4\CURAND\src\wrappers.jl:28 in create_generator at C:\Users\joaquimgarcia.julia\v0.4\CURAND\src\wrappers.jl:36 in create_generator at C:\Users\joaquimgarcia.julia\v0.4\CURAND\src\wrappers.jl:35ERROR: "cuRAND operation failed with status: CURAND_STATUS_INITIALIZATION_FAILED" in statuscheck at C:\Users\joaquimgarcia.julia\v0.4\CURAND\src\wrappers.jl:29
2015-11-21 21:05 GMT-02:00 Andrei Zhabinski notifications@github.com:
My bad, in the C code I provided I didn't recreate RNG. With recreated RNG the code works.
But what's interesting, Julia code from one of previous examples works fine for me too!
using CUDArt using CURAND
rng = create_generator() d_a = curand(rng,Float64, 1000) destroy_generator(rng) a = to_host(d_a) device_reset() result = devices(dev->capability(dev)[1]>=2) do devlist end
I can run this code whatever number of times in REPL without any error. @joaquimg https://github.com/joaquimg Could you please open a fresh Julia session (without any previous commands) and try it out?
Looks like after CUDA runtime have got memory error, it gets into inconsistent state, and any further operations - including reallocation of memory and recreation of RNG - start behaving incorrectly, producing strange and illogical errors. So any testing should always be done in a fresh session - once an error occurred, all further results are compromised.
— Reply to this email directly or view it on GitHub https://github.com/JuliaGPU/CURAND.jl/issues/3#issuecomment-158689481.
Can you try C version?
#include <stdio.h>
#include <stdlib.h>
#include <cuda.h>
#include <curand.h>
#define CUDA_CALL(expr) { \
int status = (expr); \
if (status != cudaSuccess) { \
printf("Error at %s:%d -- %d\n", __FILE__, __LINE__, status); \
exit(EXIT_FAILURE); \
} \
}
#define CURAND_CALL(expr) { \
int status = (expr); \
if (status != CURAND_STATUS_SUCCESS) { \
printf("Error at %s:%d -- %d\n", __FILE__, __LINE__, status); \
exit(EXIT_FAILURE); \
} \
}
int device()
{
int dev;
CUDA_CALL(cudaGetDevice(&dev))
return dev;
}
void device_reset()
{
CUDA_CALL(cudaDeviceReset());
}
void print_array(float* a, int n)
{
int i;
for(i = 0; i < n; i++) {
printf("%1.4f ", a[i]);
}
printf("\n");
}
int main(int argc, char *argv[])
{
size_t n = 10;
int dev;
float *A;
float *d_A;
curandGenerator_t gen;
// buffer allocation
A = (float *)calloc(n, sizeof(float));
CUDA_CALL(cudaMalloc((void **)&d_A, n * sizeof(float)));
// CURAND
CURAND_CALL(curandCreateGenerator(&gen, CURAND_RNG_PSEUDO_DEFAULT));
CURAND_CALL(curandGenerateUniform(gen, d_A, n));
CUDA_CALL(cudaMemcpy(A, d_A, n * sizeof(float), cudaMemcpyDeviceToHost));
print_array(A, n);
// CUDArt.init()
dev = device();
printf("Using device #%d\n", dev);
// CUDArt.close()
device_reset();
// CURAND again
CURAND_CALL(curandCreateGenerator(&gen, CURAND_RNG_PSEUDO_DEFAULT));
CUDA_CALL(cudaMalloc((void **)&d_A, n * sizeof(float))); -- reallocating d_A doesn't change the result
CURAND_CALL(curandGenerateUniform(gen, d_A, n));
CUDA_CALL(cudaMemcpy(A, d_A, n * sizeof(float), cudaMemcpyDeviceToHost));
print_array(A, n);
return EXIT_SUCCESS;
}
Copy and paste this code into file called "curand_test.cu", then compile and run it with:
nvcc curand_test.cu -lcurand -o test.out && ./test.out
Ok, I could build the progam, but how do I run that on windows? sorry for the probably stupid question.
2015-11-23 11:16 GMT-02:00 Andrei Zhabinski notifications@github.com:
Can you try C version?
include
include
include
include
define CUDA_CALL(expr) { \
int status = (expr); \ if (status != cudaSuccess) { \ printf("Error at %s:%d -- %d\n", **FILE**, **LINE**, status); \ exit(EXIT_FAILURE); \ } \
}
define CURAND_CALL(expr) { \
int status = (expr); \ if (status != CURAND_STATUS_SUCCESS) { \ printf("Error at %s:%d -- %d\n", **FILE**, **LINE**, status); \ exit(EXIT_FAILURE); \ } \
}
int device() { int dev; CUDA_CALL(cudaGetDevice(&dev)) return dev; }
void device_reset() { CUDA_CALL(cudaDeviceReset()); }
void print_array(float* a, int n) { int i; for(i = 0; i < n; i++) { printf("%1.4f ", a[i]); } printf("\n"); }
int main(int argc, char argv[]) { size_t n = 10; int dev; float A; float *d_A; curandGenerator_t gen;
// buffer allocation A = (float _)calloc(n, sizeof(float)); CUDACALL(cudaMalloc((void *)&d_A, n * sizeof(float)));
// CURAND CURAND_CALL(curandCreateGenerator(&gen, CURAND_RNG_PSEUDO_DEFAULT)); CURAND_CALL( curandGenerateUniform(gen, d_A, n)); CUDA_CALL(cudaMemcpy(A, d_A, n * sizeof(float), cudaMemcpyDeviceToHost)); print_array(A, n);
// CUDArt.init() dev = device(); printf("Using device #%d\n", dev);
// CUDArt.close() device_reset();
// CURAND again CURAND_CALL(curandCreateGenerator(&gen, CURAND_RNG_PSEUDO_DEFAULT)); CUDA_CALL(cudaMalloc((void *)&d_A, n * sizeof(float))); -- reallocating d_A doesn't change the result CURAND_CALL(curandGenerateUniform(gen, d_A, n)); CUDA_CALL(cudaMemcpy(A, d_A, n \ sizeof(float), cudaMemcpyDeviceToHost)); print_array(A, n);
return EXIT_SUCCESS; }
Copy and paste this code into file called "curand_test.cu", then compile and run it with:
nvcc curand_test.cu -lcurand -o test.out && ./test.out
— Reply to this email directly or view it on GitHub https://github.com/JuliaGPU/CURAND.jl/issues/3#issuecomment-158930796.
Ok, i generated an .exe and got this as result:
0.7402 0.9210 0.0390 0.9690 0.9251 0.4464 0.6673 0.1099 0.4702 0.5132 Using device #0 Error at curand_test.cu:73 -- 203
2015-11-23 20:05 GMT-02:00 Joaquim Garcia kimmldg@gmail.com:
Ok, I could build the progam, but how do I run that on windows? sorry for the probably stupid question.
2015-11-23 11:16 GMT-02:00 Andrei Zhabinski notifications@github.com:
Can you try C version?
include
include
include
include
define CUDA_CALL(expr) { \
int status = (expr); \ if (status != cudaSuccess) { \ printf("Error at %s:%d -- %d\n", **FILE**, **LINE**, status); \ exit(EXIT_FAILURE); \ } \
}
define CURAND_CALL(expr) { \
int status = (expr); \ if (status != CURAND_STATUS_SUCCESS) { \ printf("Error at %s:%d -- %d\n", **FILE**, **LINE**, status); \ exit(EXIT_FAILURE); \ } \
}
int device() { int dev; CUDA_CALL(cudaGetDevice(&dev)) return dev; }
void device_reset() { CUDA_CALL(cudaDeviceReset()); }
void print_array(float* a, int n) { int i; for(i = 0; i < n; i++) { printf("%1.4f ", a[i]); } printf("\n"); }
int main(int argc, char argv[]) { size_t n = 10; int dev; float A; float *d_A; curandGenerator_t gen;
// buffer allocation A = (float _)calloc(n, sizeof(float)); CUDACALL(cudaMalloc((void *)&d_A, n * sizeof(float)));
// CURAND CURAND_CALL(curandCreateGenerator(&gen, CURAND_RNG_PSEUDO_DEFAULT)); CURAND_CALL( curandGenerateUniform(gen, d_A, n)); CUDA_CALL(cudaMemcpy(A, d_A, n * sizeof(float), cudaMemcpyDeviceToHost)); print_array(A, n);
// CUDArt.init() dev = device(); printf("Using device #%d\n", dev);
// CUDArt.close() device_reset();
// CURAND again CURAND_CALL(curandCreateGenerator(&gen, CURAND_RNG_PSEUDO_DEFAULT)); CUDA_CALL(cudaMalloc((void *)&d_A, n * sizeof(float))); -- reallocating d_A doesn't change the result CURAND_CALL(curandGenerateUniform(gen, d_A, n)); CUDA_CALL(cudaMemcpy(A, d_A, n \ sizeof(float), cudaMemcpyDeviceToHost)); print_array(A, n);
return EXIT_SUCCESS; }
Copy and paste this code into file called "curand_test.cu", then compile and run it with:
nvcc curand_test.cu -lcurand -o test.out && ./test.out
— Reply to this email directly or view it on GitHub https://github.com/JuliaGPU/CURAND.jl/issues/3#issuecomment-158930796.
So the same error as previously. Now we know that:
Is there anybody with another Windows machine who can test it and see if the issue is reproduced?
Thats a good result! I will try on my mac later the week…. You ha been testing it on linux?
Maybe its time to ask for help in julia-users or julia-dev?
On Nov 23, 2015, at 9:09 PM, Andrei Zhabinski notifications@github.com wrote:
So the same error as previously. Now we know that:
the issue is not in Julia wrappers, but in CUDA libraries themselves the issue is specific either to Windows, or to your specific machine Is there anybody with another Windows machine who can test it and see if the issue is reproduced?
— Reply to this email directly or view it on GitHub https://github.com/JuliaGPU/CURAND.jl/issues/3#issuecomment-159096808.
Good idea, I posted a request for experiment in the old topic.
As for my config, yes, I'm on Linux with CUDA 6.5.
The C code returned the same on my MAC with cuda 7.0
0.7402 0.9210 0.0390 0.9690 0.9251 0.4464 0.6673 0.1099 0.4702 0.5132 Using device #0 Error at curand_test.cu:72 — 203
On Nov 24, 2015, at 2:19 PM, Andrei Zhabinski notifications@github.com wrote:
Good idea, I posted a request for experiment in the old topic https://groups.google.com/forum/#!searchin/julia-users/curand%2420cudart/julia-users/mJjjTyU7cQ0/GZ6SxTfeAwAJ.
As for my config, yes, I'm on Linux with CUDA 6.5.
— Reply to this email directly or view it on GitHub https://github.com/JuliaGPU/CURAND.jl/issues/3#issuecomment-159319580.
I just got the C code running on my MAC with cuda 7.5 and someonelse did it in windows with cuda 7.5:
I found the following problem while usinf CURAND combined with CUDArt´s ´do´ loop
if I load:
using CUDArt
using CURAND
and then I run either:
d_a = curand(Float64, 1000)
a = to_host(d_a)
OR:
result = devices(dev->capability(dev)[1]>=2) do devlist
end
I can repeat either block as many time I want (as long as its just one of them).
However if I run both combined (or alternatedly in any order):
d_a = curand(Float64, 1000)
a = to_host(d_a)
result = devices(dev->capability(dev)[1]>=2) do devlist
end
I get the error in IJulia:
WARNING: CUDA error triggered from: LoadError: "unspecified launch failure" while loading In[3], in expression starting on line 2 in checkerror at C:\Users\joaquimgarcia.julia\v0.4\CUDArt\src\libcudart-6.5.jl:16 in checkerror at C:\Users\joaquimgarcia.julia\v0.4\CUDArt\src\libcudart-6.5.jl:15 in copy! at C:\Users\joaquimgarcia.julia\v0.4\CUDArt\src\arrays.jl:152 in to_host at C:\Users\joaquimgarcia.julia\v0.4\CUDArt\src\arrays.jl:87 in include_string at loading.jl:266 in execute_request_0x535c5df2 at C:\Users\joaquimgarcia.julia\v0.4\IJulia\src\execute_request.jl:177 in eventloop at C:\Users\joaquimgarcia.julia\v0.4\IJulia\src\IJulia.jl:141 in anonymous at task.jl:447
and in cmd:
WARNING: CUDA error triggered from:
in checkerror at C:\Users\joaquimgarcia.julia\v0.4\CUDArt\src\libcudart-6.5.jl :15 in copy! at C:\Users\joaquimgarcia.julia\v0.4\CUDArt\src\arrays.jl:152 in to_host at C:\Users\joaquimgarcia.julia\v0.4\CUDArt\src\arrays.jl:87ERROR: "unspecified launch failure" in checkerror at C:\Users\joaquimgarcia.julia\v0.4\CUDArt\src\libcudart-6.5.jl :16
in the following post, from julia-users, this issue is also commented: https://groups.google.com/forum/#!topic/julia-users/mJjjTyU7cQ0