NVlabs / nvbio

NVBIO is a library of reusable components designed to accelerate bioinformatics applications using CUDA.
BSD 3-Clause "New" or "Revised" License
206 stars 50 forks source link

nvbio-test crashes on work_queue test #37

Open BrettDong opened 4 years ago

BrettDong commented 4 years ago

When running ./build/nvbio-test/nvbio-test, the program crashes on work_queue test:

info    : work_queue test... started
info    :   testing multi-pass work-queue:
terminate called after throwing an instance of 'thrust::system::system_error'
  what():  for_each: failed to synchronize: cudaErrorInvalidValue: invalid argument
Aborted (core dumped)

OS: Ubuntu 16.04.3 LTS Arch: amd64 GPU: GeForce GTX 1080 Ti Compiler: GCC 5.4.0 CUDA: 10.2.89

The call stack of the core dump is as follows:

Core was generated by `./nvbio-test/nvbio-test'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f6b4129c428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54  ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7f6b4291da40 (LWP 38962))]
(gdb) bt
#0  0x00007f6b4129c428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007f6b4129e02a in __GI_abort () at abort.c:89
#2  0x00007f6b41e0184d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007f6b41dff6b6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007f6b41dfe6a9 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007f6b41dff005 in __gxx_personality_v0 () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007f6b41640f83 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#7  0x00007f6b416412eb in _Unwind_RaiseException () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#8  0x00007f6b41dff90c in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x0000000000536330 in thrust::device_ptr<nvbio::wqtest::TestWorkUnit> thrust::for_each_n<thrust::cuda_cub::tag, thrust::device_ptr<nvbio::wqtest::TestWorkUnit>, long, thrust::detail::allocator_traits_detail::gozer>(thrust::detail::execution_policy_base<thrust::cuda_cub::tag> const&, thrust::device_ptr<nvbio::wqtest::TestWorkUnit>, long, thrust::detail::allocator_traits_detail::gozer) [clone .isra.317] ()
#10 0x000000000054299b in nvbio::cuda::WorkQueue<nvbio::cuda::MultiPassQueueTag, nvbio::wqtest::TestWorkUnit, 128u>::~WorkQueue() ()
#11 0x0000000000538a8d in nvbio::work_queue_test(int, char**) ()
#12 0x00000000004514ef in main ()
delahondes commented 4 years ago

I have the same issue with Ubuntu 18.04, amd64, Tesla V100: Compiler tried gcc 7 or gcc 8 (does not change anything) tried compilation with -DGPU_ARCHITECTURE=sm_70 or -DGPU_ARCHITECTURE=sm_61 (does not change anything either).

cement-head commented 3 years ago

Did you guys try to upgrade the toolchain to the latest PPA? https://launchpad.net/~ubuntu-toolchain-r/+archive/ubuntu/ppa

My test seems to be working - how long does (should) it take?

Ubuntu 18.04.5 LTS gcc (Ubuntu 9.3.0-11ubuntu0~18.04.1) 9.3.0 CUDA 11.1 GPUs: dual RTX TITANs

cement-head commented 3 years ago
$ nvidia-smi
Mon Nov  2 11:03:16 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.32.00    Driver Version: 455.32.00    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  TITAN RTX           On   | 00000000:21:00.0 Off |                  N/A |
|  0%   32C    P8     4W / 280W |      8MiB / 24220MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  TITAN RTX           On   | 00000000:31:00.0  On |                  N/A |
|  0%   34C    P2   102W / 280W |    435MiB / 24217MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2896      G   /usr/lib/xorg/Xorg                  4MiB |
|    0   N/A  N/A      5080      G   /usr/bin/compiz                     0MiB |
|    0   N/A  N/A     76456      C   ./nvbio-test                        0MiB |
|    1   N/A  N/A      2896      G   /usr/lib/xorg/Xorg                225MiB |
|    1   N/A  N/A      5080      G   /usr/bin/compiz                    32MiB |
|    1   N/A  N/A     76456      C   ./nvbio-test                      173MiB |
+-----------------------------------------------------------------------------+