Closed wpybtw closed 2 years ago
Thanks for your reply.
When I run
./sampling.bin wg ./WG/beg.bin ./WG/csr.bin 100 32 10 1 1 15 1
,which should means a random walk with 10 instances and15-depth. Right?
And I run this with 3-12 as seeds. Then, I copy out and print the data inside first two 'samples' as following
S.samples[0].vertex S.samples[0].edge
6 226374
12 268368
10 75364
7 43343
8 134964
3 670449
9 75364
5 86833
11 806291
0 0
0 0
0 0
0 0
0 0
0 0
S.samples[1].vertex S.samples[1].edge
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
1) It seems that the second depth is never executed. 2) The 10 sampled paths should be in 10 instances according to the definition. Right?
There are other issues.
curandState local_state=global_state[threadIdx.x];
in non-stream/main.cu
should be curandState local_state=global_state[0];
as it only allocates one curandState in line 225.cuda-memcheck ./sampling.bin wg ./WG/beg.bin ./WG/csr.bin 100 32 1000 1 1 15 1
, the result is as following:
========= Invalid __global__ write of size 4
========= at 0x00001570 in /home/C-SAW/non-stream/functions.cuh:992:frontier(gpu_graph*, Sampling*, int, int, int, int, int, int, int)
========= by thread (0,0,0) in block (1,0,0)
========= Address 0x7f2c2740a2e0 is out of bounds
========= Device Frame:/home/C-SAW/non-stream/functions.cuh:992:frontier(gpu_graph*, Sampling*, int, int, int, int, int, int, int) (frontier(gpu_graph*, Sampling*, int, int, int, int, int, int, int) : 0x1570)
========= Device Frame:/home/C-SAW/non-stream/main.cu:83:check(Sampling*, gpu_graph, curandStateXORWOW*, int, int, int, int) (check(Sampling*, gpu_graph, curandStateXORWOW*, int, int, int, int) : 0x107d0)
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x34e) [0x2c74be]
========= Host Frame:/usr/local/cuda/lib64/libcudart.so.11.0 [0xf62b]
========= Host Frame:/usr/local/cuda/lib64/libcudart.so.11.0 (cudaLaunchKernel + 0x1c1) [0x4f5b1]
========= Host Frame:./sampling.bin [0x2d2d]
========= Host Frame:./sampling.bin [0x27a5]
========= Host Frame:./sampling.bin [0x280f]
========= Host Frame:./sampling.bin [0x22e9]
========= Host Frame:./sampling.bin [0x4913]
========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xf0) [0x20840]
========= Host Frame:./sampling.bin [0x1699]
=========
......
=========
========= Invalid __global__ write of size 4
========= at 0x00001570 in /home/C-SAW/non-stream/functions.cuh:992:frontier(gpu_graph*, Sampling*, int, int, int, int, int, int, int)
========= by thread (0,0,0) in block (68,0,0)
========= Address 0x7f2c2740a280 is out of bounds
========= Device Frame:/home/C-SAW/non-stream/functions.cuh:992:frontier(gpu_graph*, Sampling*, int, int, int, int, int, int, int) (frontier(gpu_graph*, Sampling*, int, int, int, int, int, int, int) : 0x1570)
========= Device Frame:/home/C-SAW/non-stream/main.cu:83:check(Sampling*, gpu_graph, curandStateXORWOW*, int, int, int, int) (check(Sampling*, gpu_graph, curandStateXORWOW*, int, int, int, int) : 0x107d0)
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x34e) [0x2c74be]
========= Host Frame:/usr/local/cuda/lib64/libcudart.so.11.0 [0xf62b]
========= Host Frame:/usr/local/cuda/lib64/libcudart.so.11.0 (cudaLaunchKernel + 0x1c1) [0x4f5b1]
========= Host Frame:./sampling.bin [0x2d2d]
========= Host Frame:./sampling.bin [0x27a5]
========= Host Frame:./sampling.bin [0x280f]
========= Host Frame:./sampling.bin [0x22e9]
========= Host Frame:./sampling.bin [0x4913]
========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xf0) [0x20840]
========= Host Frame:./sampling.bin [0x1699]
=========
========= Program hit cudaErrorLaunchFailure (error 719) due to "unspecified launch failure" on CUDA API call to cudaDeviceSynchronize.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x3de8e3]
========= Host Frame:/usr/local/cuda/lib64/libcudart.so.11.0 (cudaDeviceSynchronize + 0x122) [0x35782]
========= ERROR SUMMARY: 91 errors
This test in on a platform with a RTX 2080Ti GPU, driver Version 450.36.06, and CUDA11.0
Thank you for your thorough check. I will go through the code.
I am trying to evaluate the performance. But I got confused on the result.