concept-inversion / C-SAW

A Framework for Graph Sampling and Random Walk on GPUs.
38 stars 8 forks source link

Confused on the result. NEED HELP #4

Closed wpybtw closed 2 years ago

wpybtw commented 3 years ago

I am trying to evaluate the performance. But I got confused on the result.

  1. What the meaning of "20100" used in code?
  2. I managed to copy the sampled result to host. However, the first on or firste few samples(SI) has non-zero sampled size( S.samples[i].start[0]). Similiarly, I also copy the "sampled edgelist" to host. However most of them are zero. Does it means C-SAW only process the first instances? Do I understand it correctly? What's wrong?
concept-inversion commented 3 years ago
  1. It denotes the maximum limit of samples and hash tables (static memory allocation) that can be stored in the memory. You can change the size depending upon the memory available.
  2. Could you please share the sampling configuration/parameters (sample size, depth, frontier size) you are running? It should sample all the instances. There could be some problem which is also related to non-streaming sampling. I will try to fix that soon.
wpybtw commented 3 years ago

Thanks for your reply. When I run ./sampling.bin wg ./WG/beg.bin ./WG/csr.bin 100 32 10 1 1 15 1 ,which should means a random walk with 10 instances and15-depth. Right? And I run this with 3-12 as seeds. Then, I copy out and print the data inside first two 'samples' as following

S.samples[0].vertex S.samples[0].edge
6       226374
12      268368
10      75364
7       43343
8       134964
3       670449
9       75364
5       86833
11      806291
0       0
0       0
0       0
0       0
0       0
0       0
S.samples[1].vertex S.samples[1].edge
0       0
0       0
0       0
0       0
0       0
0       0
0       0
0       0
0       0
0       0
0       0
0       0
0       0
0       0
0       0

1) It seems that the second depth is never executed. 2) The 10 sampled paths should be in 10 instances according to the definition. Right?

wpybtw commented 3 years ago

There are other issues.

  1. The code curandState local_state=global_state[threadIdx.x]; in non-stream/main.cu should be curandState local_state=global_state[0]; as it only allocates one curandState in line 225.
  2. After I fixed that, it still cannot pass cuda-memcheck. When I run cuda-memcheck ./sampling.bin wg ./WG/beg.bin ./WG/csr.bin 100 32 1000 1 1 15 1, the result is as following:
    
    ========= Invalid __global__ write of size 4
    =========     at 0x00001570 in /home/C-SAW/non-stream/functions.cuh:992:frontier(gpu_graph*, Sampling*, int, int, int, int, int, int, int)
    =========     by thread (0,0,0) in block (1,0,0)
    =========     Address 0x7f2c2740a2e0 is out of bounds
    =========     Device Frame:/home/C-SAW/non-stream/functions.cuh:992:frontier(gpu_graph*, Sampling*, int, int, int, int, int, int, int) (frontier(gpu_graph*, Sampling*, int, int, int, int, int, int, int) : 0x1570)
    =========     Device Frame:/home/C-SAW/non-stream/main.cu:83:check(Sampling*, gpu_graph, curandStateXORWOW*, int, int, int, int) (check(Sampling*, gpu_graph, curandStateXORWOW*, int, int, int, int) : 0x107d0)
    =========     Saved host backtrace up to driver entry point at kernel launch time
    =========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x34e) [0x2c74be]
    =========     Host Frame:/usr/local/cuda/lib64/libcudart.so.11.0 [0xf62b]
    =========     Host Frame:/usr/local/cuda/lib64/libcudart.so.11.0 (cudaLaunchKernel + 0x1c1) [0x4f5b1]
    =========     Host Frame:./sampling.bin [0x2d2d]
    =========     Host Frame:./sampling.bin [0x27a5]
    =========     Host Frame:./sampling.bin [0x280f]
    =========     Host Frame:./sampling.bin [0x22e9]
    =========     Host Frame:./sampling.bin [0x4913]
    =========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xf0) [0x20840]
    =========     Host Frame:./sampling.bin [0x1699]
    =========
    ......
    =========
    ========= Invalid __global__ write of size 4
    =========     at 0x00001570 in /home/C-SAW/non-stream/functions.cuh:992:frontier(gpu_graph*, Sampling*, int, int, int, int, int, int, int)
    =========     by thread (0,0,0) in block (68,0,0)
    =========     Address 0x7f2c2740a280 is out of bounds
    =========     Device Frame:/home/C-SAW/non-stream/functions.cuh:992:frontier(gpu_graph*, Sampling*, int, int, int, int, int, int, int) (frontier(gpu_graph*, Sampling*, int, int, int, int, int, int, int) : 0x1570)
    =========     Device Frame:/home/C-SAW/non-stream/main.cu:83:check(Sampling*, gpu_graph, curandStateXORWOW*, int, int, int, int) (check(Sampling*, gpu_graph, curandStateXORWOW*, int, int, int, int) : 0x107d0)
    =========     Saved host backtrace up to driver entry point at kernel launch time
    =========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x34e) [0x2c74be]
    =========     Host Frame:/usr/local/cuda/lib64/libcudart.so.11.0 [0xf62b]
    =========     Host Frame:/usr/local/cuda/lib64/libcudart.so.11.0 (cudaLaunchKernel + 0x1c1) [0x4f5b1]
    =========     Host Frame:./sampling.bin [0x2d2d]
    =========     Host Frame:./sampling.bin [0x27a5]
    =========     Host Frame:./sampling.bin [0x280f]
    =========     Host Frame:./sampling.bin [0x22e9]
    =========     Host Frame:./sampling.bin [0x4913]
    =========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xf0) [0x20840]
    =========     Host Frame:./sampling.bin [0x1699]
    =========
    ========= Program hit cudaErrorLaunchFailure (error 719) due to "unspecified launch failure" on CUDA API call to cudaDeviceSynchronize.
    =========     Saved host backtrace up to driver entry point at error
    =========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x3de8e3]
    =========     Host Frame:/usr/local/cuda/lib64/libcudart.so.11.0 (cudaDeviceSynchronize + 0x122) [0x35782]

unspecified launch failure in main.cu at line 285 ========= Host Frame:./sampling.bin [0x2442] ========= Host Frame:./sampling.bin [0x4913] ========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xf0) [0x20840] ========= Host Frame:./sampling.bin [0x1699]

========= ERROR SUMMARY: 91 errors


This test in on a platform with a RTX 2080Ti GPU, driver Version 450.36.06, and CUDA11.0
concept-inversion commented 3 years ago

Thank you for your thorough check. I will go through the code.