OpenNMT / CTranslate2

Fast inference engine for Transformer models
https://opennmt.net/CTranslate2
MIT License
3.02k stars 268 forks source link

CUDA DeviceAllocate segfault #1709

Open drzraf opened 1 month ago

drzraf commented 1 month ago
#0  0x00007bc0622c6554 in std::_Rb_tree_increment(std::_Rb_tree_node_base const*) () from /lib/x86_64-linux-gnu/libstdc++.so.6
No symbol table info available.
#1  0x00007bc05573e59a in cub::CachingDeviceAllocator::DeviceAllocate(int, void**, unsigned long, CUstream_st*) () from /home/.local/lib/python3.10/site-packages/ctranslate2.libs/libctranslate2.so.4
No symbol table info available.
#2  0x00007bc05573ea99 in ctranslate2::cuda::CubCachingAllocator::allocate(unsigned long, int) () from /home/.local/lib/python3.10/site-packages/ctranslate2.libs/libctranslate2.so.4
No symbol table info available.
#3  0x00007bc055712796 in ctranslate2::StorageView::reserve(long) () from /home/.local/lib/python3.10/site-packages/ctranslate2.libs/libctranslate2.so.4
No symbol table info available.
#4  0x00007bc0557127f8 in ctranslate2::StorageView::resize(std::vector<long, std::allocator<long> >) () from /home/.local/lib/python3.10/site-packages/ctranslate2.libs/libctranslate2.so.4
No symbol table info available.
#5  0x00007bc0556f59f2 in void ctranslate2::ops::MatMul::compute<(ctranslate2::Device)1, float>(ctranslate2::StorageView const&, ctranslate2::StorageView const&, ctranslate2::StorageView&) const ()
   from /home/.local/lib/python3.10/site-packages/ctranslate2.libs/libctranslate2.so.4
No symbol table info available.
#6  0x00007bc055660d24 in ctranslate2::layers::dot_product_attention(ctranslate2::StorageView const&, ctranslate2::StorageView const&, ctranslate2::StorageView const&, ctranslate2::StorageView const*, ctranslate2::StorageView const*, ctranslate2::StorageView const*, ctranslate2::StorageView const*, long, ctranslate2::StorageView&, ctranslate2::StorageView*, bool, float, bool, bool, long, ctranslate2::layers::Alibi*, ctranslate2::StorageView*) () from /home/.local/lib/python3.10/site-packages/ctranslate2.libs/libctranslate2.so.4
No symbol table info available.
#7  0x00007bc05566208d in ctranslate2::layers::MultiHeadAttention::operator()(ctranslate2::StorageView const&, ctranslate2::StorageView const&, ctranslate2::StorageView const*, ctranslate2::StorageView&, ctranslate2::StorageView*, ctranslate2::StorageView*, ctranslate2::StorageView*, ctranslate2::Padder const*, ctranslate2::Padder const*, bool, ctranslate2::StorageView*, long) const ()
   from /home/.local/lib/python3.10/site-packages/ctranslate2.libs/libctranslate2.so.4

CT2_VERBOSE=3 LD_LIBRARY_PATH=/home/.local/lib/python3.10/site-packages/ctranslate2.libs whisper-ctranslate2 --language=en --verbose=true --model small -f srt --output_dir /tmp/ foo.mp4

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.67                 Driver Version: 550.67         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce 940MX           Off |   00000000:01:00.0 Off |                  N/A |
| N/A   50C    P8             N/A /  200W |    1988MiB /   2048MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------
minhthuc2502 commented 1 month ago

Hello, Do you use the quantization for the small model? Which compute type you use? It seems like this is only OOM problem because you don't have enough VRAM. nvidia-smi only shows you the used memory before the moment that the program crashes. When the program tries to allocate more memory, it exceeds 2GB.

drzraf commented 1 month ago

auto and default select float32:

[2024-05-27 08:57:18.106] [ctranslate2] [thread 3417167] [info] - Allow INT8: false [2024-05-27 08:57:18.106] [ctranslate2] [thread 3417167] [info] - Allow FP16: false (with Tensor Cores: false) [2024-05-27 08:57:18.106] [ctranslate2] [thread 3417167] [info] - Allow BF16: false [2024-05-27 08:57:19.253] [ctranslate2] [thread 3417167] [info] Using CUDA allocator: cub_caching [2024-05-27 08:57:19.995] [ctranslate2] [thread 3417167] [info] - Binary version: 6 [2024-05-27 08:57:19.995] [ctranslate2] [thread 3417167] [info] - Model specification revision: 3 [2024-05-27 08:57:19.995] [ctranslate2] [thread 3417167] [info] - Selected compute type: float32

minhthuc2502 commented 1 month ago

Try the quantization int8 or float16. Your GPU is small to work with medium model float32, it's normal. Bfloat16 only works with GPU 8.x or newer (your GPU could be only 7.x)