Jokeren / GPA

GPU Performance Advisor
BSD 3-Clause "New" or "Revised" License
58 stars 8 forks source link

Unable to use GPA tool to get advice #5

Open HarsonLau opened 2 years ago

HarsonLau commented 2 years ago

I completed the installation of this project through your installation script. My GPU is T4 and I am using cuda toolkit 11.6. When I followed the tutorial in install.md, using GPA to get advice, I get no output in the directory gpa-database/

root@n37-139-082:~# cd GPA/
root@n37-139-082:~/GPA# cd ./GPA-Benchmark/ExaTENSOR/exatensor-opt1
root@n37-139-082:~/GPA/GPA-Benchmark/ExaTENSOR/exatensor-opt1# make 
make: 'all' is up to date.
root@n37-139-082:~/GPA/GPA-Benchmark/ExaTENSOR/exatensor-opt1# make clean 
rm -rf main *.o *.dot *.hpcstruct *.cubin *.qdrep *.sqlite
root@n37-139-082:~/GPA/GPA-Benchmark/ExaTENSOR/exatensor-opt1# make 
nvcc -o main main.cu -DCUDA3 -Xcompiler "-g -fopenmp" -O3 -lineinfo  -lcudart -lcuda -lstdc++ -lm
root@n37-139-082:~/GPA/GPA-Benchmark/ExaTENSOR/exatensor-opt1# gpa -v ./main
Make sure gpa-measurements and gpa-database is clean
Profiling: collect pc sampling performance metrics
Parsing: parse CPU and GPU binaries
Analyzing: match metrics with advice
Output advice in gpa-database/gpa.advice
Done...
root@n37-139-082:~/GPA/GPA-Benchmark/ExaTENSOR/exatensor-opt1# ls gpa-database/
root@n37-139-082:~/GPA/GPA-Benchmark/ExaTENSOR/exatensor-opt1# 

This is the content of gpa.log

root@n37-139-082:~/GPA/GPA-Benchmark/ExaTENSOR/exatensor-opt1# cat gpa.log 
NOTE: Using builtin path for NVIDIA's CUPTI tools library /usr/local/cuda/lib64/libcupti.so.
Elapsed time 0.013346
msg: begin serial analysis of 8a99426adfeaf92557e0b6842027decb.cubin
WARNING: incomplete analysis of 8a99426adfeaf92557e0b6842027decb.cubin; see /root/GPA/GPA-Benchmark/ExaTENSOR/exatensor-opt1/gpa-measurements/structs/8a99426adfeaf92557e0b6842027decb.cubin.warnings for details
msg: end serial analysis of 8a99426adfeaf92557e0b6842027decb.cubin
HPCStructure fatal error: processing Document:STRUCTURE file '/root/GPA/GPA-Benchmark/ExaTENSOR/exatensor-opt1/gpa-measurements/structs/8a99426adfeaf92557e0b6842027decb.cubin.hpcstruct' at line 69, character 1:
        XML parser: invalid document structure.
root@n37-139-082:~/GPA/GPA-Benchmark/ExaTENSOR/exatensor-opt1# 

The warning mentioned above shows that a segfault has occurred, but I don't know exactly what is causing it

root@n37-139-082:~/GPA/GPA-Benchmark/ExaTENSOR/exatensor-opt1# cat gpa-measurements/structs/8a99426adfeaf92557e0b6842027decb.cubin.warnings 
Segmentation fault (core dumped)
Jokeren commented 2 years ago

Can you please try cuda 11.1.?

HarsonLau commented 2 years ago

Can you please try cuda 11.1.?

Unfortunately, even with cuda 11.1.0 it still doesn't work. YTe3igcckF This is the error message recorded in gpa.log after adding the -v option. As you can see, I am running GPA as root, and this permission-related error is very confusing

Jokeren commented 2 years ago

It might be just a common cupti problem.

https://developer.nvidia.com/nvidia-development-tools-solutions-err-nvgpuctrperm-cupti