shingchuang commented 4 years ago

Hi,

I use ubuntu 18.04 with V100 GPU.

I ran the benchmark 'nvm-cuda-bench -c /dev/libnvm0', but got the error message "Unexpected error: Unexpected CUDA error: an illegal memory access was encountered". And, the dmesg also shows the following also messages,

[263169.171738] Adding controller device: 88:00.0 [263169.172098] Character device /dev/libnvm0 created (504.0) [263169.172185] libnvm helper loaded [263209.858820] Mapping for address 7ff72da00000 not found [263255.876777] NVRM: Xid (PCI:0000:1b:00): 13, Graphics SM Warp Exception on (GPC 2, TPC 0, SM 0): Out Of Range Address [263255.876795] NVRM: Xid (PCI:0000:1b:00): 13, Graphics Exception: ESR 0x514730=0x201000e 0x514734=0x20 0x514728=0x4c1eb72 0x51472c=0x174 [263255.877633] NVRM: Xid (PCI:0000:1b:00): 43, Ch 00000030

It seems the GPU cannot access the device registers of NVME, is that true? And, do you know how to solve it?

enfiskutensykkel commented 4 years ago

Hi,

It seems to be a compound issue, the following line would indicate that the kernel module was unable to pin and map GPU memory as well: [263209.858820] Mapping for address 7ff72da00000 not found

Do you still have the output from cmake? It should indicate whether or not it located the Nvidia driver source in order to find the necessary nv-p2p.h header.

shingchuang commented 4 years ago

Hi,

It seems to be a compound issue, the following line would indicate that the kernel module was unable to pin and map GPU memory as well: [263209.858820] Mapping for address 7ff72da00000 not found

Do you still have the output from cmake? It should indicate whether or not it located the Nvidia driver source in order to find the necessary nv-p2p.h header.

Yes, please refer it as following.

cd build

cmake ..

-- The C compiler identification is GNU 7.5.0 -- The CXX compiler identification is GNU 7.5.0 -- Check for working C compiler: /usr/bin/cc -- Check for working C compiler: /usr/bin/cc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Detecting C compile features -- Detecting C compile features - done -- Check for working CXX compiler: /usr/bin/c++ -- Check for working CXX compiler: /usr/bin/c++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done -- Looking for pthread.h -- Looking for pthread.h - found -- Looking for pthread_create -- Looking for pthread_create - not found -- Check if compiler accepts -pthread -- Check if compiler accepts -pthread - yes -- Found Threads: TRUE
-- Found CUDA: /usr/local/cuda (found suitable version "10.1", minimum required is "8.0") -- Using NVIDIA driver found in /usr/src/nvidia-418.87.01 -- Configuring kernel module with CUDA -- Configuring done -- Generating done -- Build files have been written to: /home/dgx/shing/ssd-gpu-dma/build

enfiskutensykkel commented 4 years ago

Okay, it appears that the kernel module is built correctly. I believe I've seen this issue before, I suspect it might be caused by incorrect alignment and offset calculations in the program (issue #25 for reference). Can you please try running with --threads=1 --pages=1 --chunks=1 arguments?

If this does not work, could you also please try running the nvm-latency-bench program? This will confirm if the NVMe is able to access the GPU at all.

shingchuang commented 4 years ago

With --threads=1 --pages=1 --chunks=1, it hangs and never returns.

./bin/nvm-cuda-bench -c /dev/libnvm0 --threads=1 --pages=1 --chunks=1

CUDA device : 0 Tesla V100-SXM2-16GB (0000:1b:00.0) Controller page size : 4096 B Namespace block size : 4096 B Number of threads : 1 Chunks per thread : 1 Pages per chunk : 1 Total number of pages : 1 Total number of blocks: 1 Double buffering : no

But it works with "--threads=8 --pages=8 --chunks=1"

./bin/nvm-cuda-bench -c /dev/libnvm0 --threads=8 --pages=8 --chunks=1

CUDA device : 0 Tesla V100-SXM2-16GB (0000:1b:00.0) Controller page size : 4096 B Namespace block size : 4096 B Number of threads : 8 Chunks per thread : 1 Pages per chunk : 8 Total number of pages : 64 Total number of blocks: 64 Double buffering : no Event time elapsed : 2003.328 µs Estimated bandwidth : 130.854 MiB/s

The nvm-latency-bench seems works, but has a error message as following

./bin/nvm-latency-bench --ctrl=/dev/libnvm0 --blocks=2 --queue="no=128,location=local" --bw

Resetting controller... DONE Preparing queues... DONE Preparing buffers and transfer lists... DONE Running bandwidth benchmark (reading, sequential, 1000 iterations)... DONE Calculating percentiles... Queue #128 read percentiles (1000 samples) bandwidth, adj iops, cmd latency, prp latency max: 425.227, 103815.209, 5121.001, 2560.501 0.99: 423.545, 103404.597, 35.934, 17.967 0.97: 422.922, 103252.453, 28.467, 14.233 0.95: 422.475, 103143.292, 28.358, 14.179 0.90: 421.920, 103007.829, 28.288, 14.144 0.75: 420.297, 102611.462, 20.288, 10.144 0.50: 417.853, 102014.792, 19.605, 9.803 0.25: 404.174, 98675.287, 19.492, 9.746 0.10: 289.618, 70707.606, 19.416, 9.708 0.05: 288.908, 70534.297, 19.392, 9.696 0.01: 250.370, 61125.483, 19.345, 9.672 min: 1.600, 390.549, 19.265, 9.633 End percentiles [unmap_memory] Page unmapping kernel request failed: Invalid argument OK!

enfiskutensykkel commented 4 years ago

Yes, it appears to be an issue with alignment and offset calculation. I would suggest using the sisci-5.11 branch, I believe I made some fixes there, although it probably still has some issues. It's still highly experimental, so the best approach is probably looking at the code for reference.

Beware that I have changed the arguments to both nvm-latency-bench and nvm-cuda-bench somewhat in that branch.

The nvm-latency-bench seems works, but has a error message as following

The unmap_memory failing is a "known bug", I've been meaning to overhaul the entire thing. Work started in the sisci-5.11 branch, I strongly recommend using this branch if you are able to. There are still some bugs to work out in the module, as I have been focusing mostly on the SmartIO part of the code. So it might be holding on to memory, which is temporarily fixed by periodically rmmod and insmoding it.

shingchuang commented 4 years ago

The branch sisci-5.11 makes the "nvm-cuda-bench" works. Thanks.

enfiskutensykkel / ssd-gpu-dma

Unexpected error: Unexpected CUDA error: an illegal memory access was encountered #29

cd build

cmake ..

./bin/nvm-cuda-bench -c /dev/libnvm0 --threads=1 --pages=1 --chunks=1

./bin/nvm-cuda-bench -c /dev/libnvm0 --threads=8 --pages=8 --chunks=1

./bin/nvm-latency-bench --ctrl=/dev/libnvm0 --blocks=2 --queue="no=128,location=local" --bw