NVlabs / NVBit

220 stars 20 forks source link

undefined symbol #3

Closed ZejiaZheng closed 4 years ago

ZejiaZheng commented 4 years ago

I got this error when running nvbit according to the instructions in Readme.

./test-apps/vectoradd/vectoradd: symbol lookup error: ./tools/instr_count/instr_count.so: undefined symbol: _ZTVNSt7__cxx1118basic_stringstreamIcSt11char_traitsIcESaIcEEE

Any suggestions?

ovilla commented 4 years ago

This error is typically encountered when compiling the tool with some incompatible version of g++/gcc. Can you please paste the content of your gcc -v and ldd tools/instr_count/instr_count.so?

ZejiaZheng commented 4 years ago

gcc -v output:

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/6/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 6.5.0-2ubuntu1~14.04.1' --with-bugurl=file:///usr/share/doc/gcc-6/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --with-as=/usr/bin/x86_64-linux-gnu-as --with-ld=/usr/bin/x86_64-linux-gnu-ld --program-suffix=-6 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=gcc4-compatible --disable-libstdcxx-dual-abi --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-6-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-6-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-6-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 6.5.0 20181026 (Ubuntu 6.5.0-2ubuntu1~14.04.1) 

ldd tools/instr_count/instr_count.so output:

    linux-vdso.so.1 =>  (0x00007fffe2bea000)
    libcuda.so.1 => /usr/lib/x86_64-linux-gnu/libcuda.so.1 (0x00007f5006ce6000)
    librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f5006ade000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f50068c0000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f50066bb000)
    libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f50063a5000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f500618d000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f5005dc3000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f5005abd000)
    libnvidia-fatbinaryloader.so.410.73 => /usr/lib/nvidia-410/libnvidia-fatbinaryloader.so.410.73 (0x00007f5005870000)
    /lib64/ld-linux-x86-64.so.2 (0x0000563ff73f9000)
ovilla commented 4 years ago

I am pretty sure you are using an "old" version of libstdc++.so.6 which is using glibc < GLIBCXX_3.4.21.

To confirm that can you please paste this?

strings /usr/lib/x86_64-linux-gnu/libstdc++.so.6 | grep GLIBCXX

if that is the issue you should try to update to a newer version of libstdc++.so.6. Meanwhile I will try to see if there is a way to make this less error prone for future releases.

ZejiaZheng commented 4 years ago

Output from strings /usr/lib/x86_64-linux-gnu/libstdc++.so.6 | grep GLIBCXX :

GLIBCXX_3.4
GLIBCXX_3.4.1
GLIBCXX_3.4.2
GLIBCXX_3.4.3
GLIBCXX_3.4.4
GLIBCXX_3.4.5
GLIBCXX_3.4.6
GLIBCXX_3.4.7
GLIBCXX_3.4.8
GLIBCXX_3.4.9
GLIBCXX_3.4.10
GLIBCXX_3.4.11
GLIBCXX_3.4.12
GLIBCXX_3.4.13
GLIBCXX_3.4.14
GLIBCXX_3.4.15
GLIBCXX_3.4.16
GLIBCXX_3.4.17
GLIBCXX_3.4.18
GLIBCXX_3.4.19
GLIBCXX_3.4.20
GLIBCXX_3.4.21
GLIBCXX_3.4.22
GLIBCXX_3.4.23
GLIBCXX_3.4.24
GLIBCXX_3.4.25
GLIBCXX_DEBUG_MESSAGE_LENGTH
ovilla commented 4 years ago

I am puzzled... everything seems fine. I will try to make more tests, but not sure when I can get to the bottom of it. Give it a try in another box or ubuntu version if you can please.

ardhiwiratamaby commented 4 years ago

I encountered the same problem. I am using CUDA Driver version 440.33.01. ./test-apps/vectoradd/vectoradd: symbol lookup error: ./tools/mem_printf/mem_printf.so: undefined symbol: _ZTVNSt7__cxx1118basic_stringstreamIcSt11char_traitsIcESaIcEEE

I can't use older driver version in my cluster. Any solutions ?

ovilla commented 4 years ago

I encountered the same problem. I am using CUDA Driver version 440.33.01. ./test-apps/vectoradd/vectoradd: symbol lookup error: ./tools/mem_printf/mem_printf.so: undefined symbol: _ZTVNSt7__cxx1118basic_stringstreamIcSt11char_traitsIcESaIcEEE

I can't use older driver version in my cluster. Any solutions ?

Can you please paste the console output of when you type make inside the tools/mem_printf folder. A possibility of this problem is that nvcc is using underneath an old version of gcc/g++ on that specific compilation.

So far I have tried on many machines, libraries and compiler versions and unfortunately I was not able to reproduce the issue on my side.

ardhiwiratamaby commented 4 years ago

Thanks a lot, it works. I have change the gcc version and cuda version.

ovilla commented 4 years ago

Great! Can you please let me know exactly which version you were using before and after. I hope it would help other people that could possibly stumble upon this issue. Thanks.

ardhiwiratamaby commented 4 years ago

I was using cuda-10.0 and I think the nvcc was using gcc 4.9.2. I changed it to gcc-6.2.0 and cuda-9.0. My cuda driver version is 440.33.01.

ovilla commented 4 years ago

The requirement is GCC version: >= 5.3.0 (see https://github.com/NVlabs/NVBit/blob/master/README.md) which is the one used to build libnvbit.a. So if when compiling an nvbit tool, a lower version of GCC is used the symbols are not found and left unresolved (because we are building a dynamic library). Then when we run the tool we get the "symbol lookup error" message.

The version of CUDA driver you are using a bit higher (see https://github.com/NVlabs/NVBit/blob/master/README.md ) but this is a separate issue and completely unrelated from the "symbol lookup error". If it works for you, there is no need to lower the CUDA driver version, it is just that I did not test it with that version.

Thanks again for the update.

ZejiaZheng commented 4 years ago

Hi @ovilla coming back to this issue, I finally had time to play around with this a little more and was able to get this working with ubuntu 16.04 and 18.04.

This issue is reproducible with ubuntu 14.04 (both on a local machine and on a docker image).

Would it be possible for you to provide a version of this library compiled on ubuntu 14.04? It's very hard for me to test my repo on an upgraded version of ubuntu due to other library dependencies.

Thank you and looking forward to hearing back from you.

x-y-z commented 4 years ago

Can you try the method below and see if it solves your issue?

When using nvbit tool, add libstdc++.so with its full path to the LD_PRELOAD as well. Something like: LD_PRELOAD=<path to nvbit tool>,/usr/lib/x86_64-linux-gnu/libstdc++.so.6 <your app>. Basically, it will load libstdc++.so, which should have the missing symbol.

Let us know whether it works.

ZejiaZheng commented 4 years ago

@x-y-z Thanks for the reply.

LD_PRELOAD=./tools/instr_count/instr_count.so:/usr/lib/x86_64-linux-gnu/libstdc++.so.6 ./test-apps/vectoradd/vectoradd
./test-apps/vectoradd/vectoradd: symbol lookup error: ./tools/instr_count/instr_count.so: undefined symbol: _ZTVNSt7__cxx1118basic_stringstreamIcSt11char_traitsIcESaIcEEE

Also

strings /usr/lib/x86_64-linux-gnu/libstdc++.so.6 | grep _ZTVNSt7__cxx1118basic_stringstreamIcSt11char_traitsIcESaIcEEE

Returns no matches.

x-y-z commented 4 years ago

@x-y-z Thanks for the reply.

LD_PRELOAD=./tools/instr_count/instr_count.so:/usr/lib/x86_64-linux-gnu/libstdc++.so.6 ./test-apps/vectoradd/vectoradd
./test-apps/vectoradd/vectoradd: symbol lookup error: ./tools/instr_count/instr_count.so: undefined symbol: _ZTVNSt7__cxx1118basic_stringstreamIcSt11char_traitsIcESaIcEEE

Also

strings /usr/lib/x86_64-linux-gnu/libstdc++.so.6 | grep _ZTVNSt7__cxx1118basic_stringstreamIcSt11char_traitsIcESaIcEEE

Returns no matches.

It seems that your libstdc++.so does not have that symbol. Could you try to copy libstdc++.so with that symbol (like from ubuntu 16.04) to your ubuntu 14.04 machine and try this again? NVBit is compiled with gcc 5.3.0 but ubuntu 14.04 seems to have older gcc. This could cause the missing symbol issue.

ZejiaZheng commented 4 years ago

Great! It worked! Thank you!

x-y-z commented 4 years ago

OK, I did more comprehensive investigation on this. The issue comes from gcc's Dual ABI thing. Ubuntu 14.04 comes with gcc 4.8.y, which does not use new c++11 ABI, so _ZTVNSt7__cxx1118basic_stringstreamIcSt11char_traitsIcESaIcEEE, which is the new c++11 ABI, is not found in your libstdc++. NVBit is compiled with gcc 5.3, which is using new c++11 ABI.

To verify it, if you do strings /usr/lib/x86_64-linux-gnu/libstdc++.so.6 | grep basic_stringstream in your ubuntu 14.04, you should see something like _ZNSt18basic_stringstreamIcSt11char_traitsIcESaIcEED4Ev, which is the old ABI.

As I see you are using gcc 6.5.0 compiling NVBit tools, that might cause the issue if your tool uses stringstream and you run it in ubuntu 14.04.