NVlabs / NVBit

200 stars 18 forks source link

nvdisasm not found on PATH #26

Open yinengy opened 3 years ago

yinengy commented 3 years ago

I got the error message when I try to instrument pytorch program with mem_printf.so.

~$ LD_PRELOAD=/home/username/nvbit_release/tools/mem_printf/mem_printf.so python3 two_layer_net_tensor.py 
------------- NVBit (NVidia Binary Instrumentation Tool v1.4) Loaded --------------
NVBit core environment variables (mostly for nvbit-devs):
            NVDISASM = nvdisasm - override default nvdisasm found in PATH
            NOBANNER = 0 - if set, does not print this banner
---------------------------------------------------------------------------------
         INSTR_BEGIN = 0 - Beginning of the instruction interval where to apply instrumentation
           INSTR_END = 4294967295 - End of the instruction interval where to apply instrumentation
        TOOL_VERBOSE = 0 - Enable verbosity inside the tool
----------------------------------------------------------------------------------------------------
ERROR:` nvdisasm not found on PATH!!!

But it is actually on PATH.

~$ which nvdisasm 
/usr/local/cuda-10.2/bin/nvdisasm

Use CUDA 9.2 with NVBit 1.1 will also leads to a similar error message.

Thanks!

ovilla commented 3 years ago

you could try this to override the nvdisasm

NVDISASM=/usr/local/cuda-10.2/bin/nvdisasm LD_PRELOAD=/home/username/nvbit_release/tools/mem_printf/mem_printf.so python3 two_layer_net_tensor.py

but it should work without. What do you see if you type env?

yinengy commented 3 years ago

NVDISASM=/usr/local/cuda-10.2/bin/nvdisasm

~$ NVDISASM=/usr/local/cuda-10.2/bin/nvdisasm LD_PRELOAD=/home/username/nvbit_release/tools/mem_printf/mem_printf.so python3 two_layer_net_tensor.py
------------- NVBit (NVidia Binary Instrumentation Tool v1.4) Loaded --------------
NVBit core environment variables (mostly for nvbit-devs):
            NVDISASM = /usr/local/cuda-10.2/bin/nvdisasm - override default nvdisasm found in PATH
            NOBANNER = 0 - if set, does not print this banner
---------------------------------------------------------------------------------
         INSTR_BEGIN = 0 - Beginning of the instruction interval where to apply instrumentation
           INSTR_END = 4294967295 - End of the instruction interval where to apply instrumentation
        TOOL_VERBOSE = 0 - Enable verbosity inside the tool
----------------------------------------------------------------------------------------------------
ERROR: /usr/local/cuda-10.2/bin/nvdisasm not found on PATH!!!

type env will give me:

XDG_SESSION_ID=1
TERM_PROGRAM=vscode
TERM=xterm-256color
SHELL=/bin/bash
AMD_ENTRYPOINT=vs/server/remoteExtensionHostProcess
SSH_CLIENT=73.144.154.30 58777 22
TERM_PROGRAM_VERSION=1.48.2
USER=yinengy
LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:
LD_LIBRARY_PATH=/home/yinengy/distro/install/lib:/usr/local/cuda-10.2/lib64:/home/yinengy/distro/install/lib:/home/yinengy/distro/install/lib:/home/yinengy/distro/install/lib:/usr/local/cuda-10.2/lib64
PATH=/home/yinengy/distro/install/bin:/usr/local/cuda-10.2/bin:/home/yinengy/.vscode-server/bin/a0479759d6e9ea56afa657e454193f72aef85bd0/bin:/home/yinengy/distro/install/bin:/home/yinengy/distro/install/bin:/home/yinengy/bin:/home/yinengy/.local/bin:/home/yinengy/distro/install/bin:/usr/local/cuda-10.2/bin:/home/yinengy/.vscode-server/bin/a0479759d6e9ea56afa657e454193f72aef85bd0/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
MAIL=/var/mail/yinengy
PWD=/home/yinengy
LUA_PATH=/home/yinengy/.luarocks/share/lua/5.1/?.lua;/home/yinengy/.luarocks/share/lua/5.1/?/init.lua;/home/yinengy/distro/install/share/lua/5.1/?.lua;/home/yinengy/distro/install/share/lua/5.1/?/init.lua;./?.lua;/home/yinengy/distro/install/share/luajit-2.1.0-beta1/?.lua;/usr/local/share/lua/5.1/?.lua;/usr/local/share/lua/5.1/?/init.lua
LANG=en_US.UTF-8
LUA_CPATH=/home/yinengy/distro/install/lib/?.so;/home/yinengy/.luarocks/lib/lua/5.1/?.so;/home/yinengy/distro/install/lib/lua/5.1/?.so;/home/yinengy/distro/install/lib/?.so;./?.so;/usr/local/lib/lua/5.1/?.so;/usr/local/lib/lua/5.1/loadall.so
HOME=/home/yinengy
SHLVL=4
VSCODE_GIT_ASKPASS_MAIN=/home/yinengy/.vscode-server/bin/a0479759d6e9ea56afa657e454193f72aef85bd0/extensions/git/dist/askpass-main.js
PIPE_LOGGING=true
DYLD_LIBRARY_PATH=/home/yinengy/distro/install/lib:/home/yinengy/distro/install/lib:/home/yinengy/distro/install/lib:/home/yinengy/distro/install/lib:
LOGNAME=yinengy
VSCODE_GIT_IPC_HANDLE=/run/user/1001/vscode-git-0ff09c2746.sock
XDG_DATA_DIRS=/usr/local/share:/usr/share:/var/lib/snapd/desktop
SSH_CONNECTION=73.144.154.30 58777 10.128.0.3 22
VSCODE_IPC_HOOK_CLI=/tmp/vscode-ipc-f3bae490-b524-46f8-85ad-dd9184ba7d9d.sock
LESSOPEN=| /usr/bin/lesspipe %s
VSCODE_GIT_ASKPASS_NODE=/home/yinengy/.vscode-server/bin/a0479759d6e9ea56afa657e454193f72aef85bd0/node
GIT_ASKPASS=/home/yinengy/.vscode-server/bin/a0479759d6e9ea56afa657e454193f72aef85bd0/extensions/git/dist/askpass.sh
XDG_RUNTIME_DIR=/run/user/1001
VERBOSE_LOGGING=true
LESSCLOSE=/usr/bin/lesspipe %s %s
COLORTERM=truecolor
_=/usr/bin/env

echo $PATH gives:

/home/yinengy/distro/install/bin:/usr/local/cuda-10.2/bin:/home/yinengy/.vscode-server/bin/a0479759d6e9ea56afa657e454193f72aef85bd0/bin:/home/yinengy/distro/install/bin:/home/yinengy/distro/install/bin:/home/yinengy/bin:/home/yinengy/.local/bin:/home/yinengy/distro/install/bin:/usr/local/cuda-10.2/bin:/home/yinengy/.vscode-server/bin/a0479759d6e9ea56afa657e454193f72aef85bd0/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
x-y-z commented 3 years ago

BTW, do you have which program in your system? Maybe because it is missing.

yinengy commented 3 years ago

which

I have the which program in my system. I used it in my first comment to show the path of nvdisasm.

ovilla commented 3 years ago

Have you tried running a non python application, for instance the simple test-app that comes with NVBit? It is the first time that we see this error, I suspect something very peculiar about your machine configuration.

yinengy commented 3 years ago

I have used NVBit on normal .cu program for a several month without any problems. It is ok to run in native cuda like rodinia 3.1. But failed for caffe, torch 7, and pytorch. For caffe, a core dump will happen when I try to instrument it. For torch 7, instrument never works, no error reported, just nothing happen, the instrument doesn’t take effect, only a banner is printed.

The environment is a server on Google Cloud Platform with Tesla P4.

ovilla commented 3 years ago

Can you try to use CUDA_INJECTION64_PATH instead of LD_PRELOAD?

Also, can you please send exact instructions on how to reproduce the core dump situation? As usual, no promises we will be able to fix it, but we will try to take a look when we find some time.

Thanks!

yinengy commented 3 years ago

Thank you for your help. Replace LD_PRELOAD by CUDA_INJECTION64_PATH gives the same error message. Nothing changed.

For caffe, compile and install it and then cd to the root directory of caffe.

~/caffe$ LD_PRELOAD=/home/yinengy/nvbit_release/tools/mem_printf/mem_printf.so .build_release/tools/caffe
------------- NVBit (NVidia Binary Instrumentation Tool v1.4) Loaded --------------
NVBit core environment variables (mostly for nvbit-devs):
            NVDISASM = nvdisasm - override default nvdisasm found in PATH
            NOBANNER = 0 - if set, does not print this banner
---------------------------------------------------------------------------------
         INSTR_BEGIN = 0 - Beginning of the instruction interval where to apply instrumentation
           INSTR_END = 4294967295 - End of the instruction interval where to apply instrumentation
        TOOL_VERBOSE = 0 - Enable verbosity inside the tool
----------------------------------------------------------------------------------------------------
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0901 18:22:27.429435  3651 parallel.cpp:48] P2PManager::Init @ <my server name>
I0901 18:22:27.559656  3651 gpu_memory.cpp:82] GPUMemory::Manager initialized
I0901 18:22:27.560271  3651 gpu_memory.cpp:84] Total memory: 7981694976, Free: 7852195840, dev_info[0]: total=7981694976 free=7852195840
I0901 18:22:28.731178  3651 caffe.cpp:702] This is NVCaffe 0.17.3 started at Tue Sep  1 18:22:27 2020
I0901 18:22:28.731402  3651 caffe.cpp:704] CuDNN version: USE_CUDNN is not defined
I0901 18:22:28.731448  3651 caffe.cpp:705] CuBLAS version: 10202
I0901 18:22:28.731489  3651 caffe.cpp:706] CUDA version: 10020
I0901 18:22:28.731526  3651 caffe.cpp:707] CUDA driver version: 10020
I0901 18:22:28.731570  3651 caffe.cpp:708] Arguments: 
[0]: .build_release/tools/caffe
caffe: command line brew
usage: caffe <command> <args>

commands:
  train           train or finetune a model
  test            score a model
  device_query    show GPU diagnostic information
  time            benchmark model execution time

  Flags from tools/caffe.cpp:
    -ap_version (Average Precision type for object detection) type: string
      default: "11point"
    -gpu (Optional; run in GPU mode on given device IDs separated by ', '.Use
      '-gpu all' to run on all available GPUs. The effective training batch
      size is multiplied by the number of devices.) type: string default: ""
    -iterations (The number of iterations to run.) type: int32 default: 50
    -level (Optional; network level.) type: int32 default: 0
    -model (The model definition protocol buffer text file.) type: string
      default: ""
    -phase (Optional; network phase (TRAIN or TEST). Only used for 'time'.)
      type: string default: ""
    -show_per_class_result (Show per class result for object detection)
      type: bool default: true
    -sighup_effect (Optional; action to take when a SIGHUP signal is received:
      snapshot, stop or none.) type: string default: "snapshot"
    -sigint_effect (Optional; action to take when a SIGINT signal is received:
      snapshot, stop or none.) type: string default: "stop"
    -snapshot (Optional; the snapshot solver state to resume training.)
      type: string default: ""
    -solver (The solver definition protocol buffer text file.) type: string
      default: ""
    -stage (Optional; network stages (not to be confused with phase), separated
      by ','.) type: string default: ""
    -weights (Optional; the pretrained weights to initialize finetuning,
      separated by ', '. Cannot be set simultaneously with snapshot.)
      type: string default: ""
*** Aborted at 1598984548 (unix time) try "date -d @1598984548" if you are using GNU date ***
PC: @     0x7f63d037e259 Nvbit::module_unloading()
*** SIGSEGV (@0x0) received by PID 3651 (TID 0x7f63d080e880) from PID 0; stack trace: ***
    @     0x7f63ccff94c0 (unknown)
    @     0x7f63d037e259 Nvbit::module_unloading()
    @     0x7f63d03863a8 nvbitToolsCallbackFunc()
    @     0x7f63cc16ab23 (unknown)
    @     0x7f63cbfa8745 (unknown)
    @     0x7f63cbeb39fa (unknown)
    @     0x7f63cc03d77a cuModuleUnload
    @     0x7f63ce327e4f (unknown)
    @     0x7f63ce329190 (unknown)
    @     0x7f63ce32e077 (unknown)
    @     0x7f63ce32e492 (unknown)
    @     0x7f63ce31fb0c (unknown)
    @     0x7f63ce320355 (unknown)
    @     0x7f63ccffe37a __cxa_finalize
    @     0x7f63ce3077e6 (unknown)
    @     0x7f63ce35d611 (unknown)
    @     0x7f63ccffe008 (unknown)
    @     0x7f63d088f558 (unknown)
Segmentation fault (core dumped)

Such error won't be triggered if run without NVBit.

Note: an empty NVBit tool (just do nothing) will also trigger this fault.

yinengy commented 3 years ago

And I am wondering how to instrument Torch 7. Just LD_PRELOAD=xxxxx th <file to execute>? It seems doesn't work for me. Thanks.

For more information, NVBit works on cuDNN perfectly.