Open drzraf opened 1 month ago
Hello, Do you use the quantization for the small model? Which compute type you use? It seems like this is only OOM problem because you don't have enough VRAM. nvidia-smi
only shows you the used memory before the moment that the program crashes. When the program tries to allocate more memory, it exceeds 2GB.
Tested all of them (with small
) without CT2_*
env and got ValueError: Requested XXX compute type, but the target device or backend do not support efficient XXX computation.
except for float32
which triggers a segfault.
float32
always segfaults
Setting CT2_CUDA_ALLOW_FP16=1
it only works for float16
(others trigger ValueError
)
Setting CT2_CUDA_ALLOW_BF16=1
, then bfloat16
gives RuntimeError: cuDNN failed with status CUDNN_STATUS_ARCH_MISMATCH
(others trigger ValueError
)
auto
and default
select float32
:
[2024-05-27 08:57:18.106] [ctranslate2] [thread 3417167] [info] - Allow INT8: false [2024-05-27 08:57:18.106] [ctranslate2] [thread 3417167] [info] - Allow FP16: false (with Tensor Cores: false) [2024-05-27 08:57:18.106] [ctranslate2] [thread 3417167] [info] - Allow BF16: false [2024-05-27 08:57:19.253] [ctranslate2] [thread 3417167] [info] Using CUDA allocator: cub_caching [2024-05-27 08:57:19.995] [ctranslate2] [thread 3417167] [info] - Binary version: 6 [2024-05-27 08:57:19.995] [ctranslate2] [thread 3417167] [info] - Model specification revision: 3 [2024-05-27 08:57:19.995] [ctranslate2] [thread 3417167] [info] - Selected compute type: float32
medium
segfault even with CT2_CUDA_ALLOW_FP16=1
Try the quantization int8 or float16. Your GPU is small to work with medium model float32, it's normal. Bfloat16
only works with GPU 8.x or newer (your GPU could be only 7.x)
CT2_VERBOSE=3 LD_LIBRARY_PATH=/home/.local/lib/python3.10/site-packages/ctranslate2.libs whisper-ctranslate2 --language=en --verbose=true --model small -f srt --output_dir /tmp/ foo.mp4
small
model (sadly) doesn't hold within my 2GB GPU but causes a segfault instead of failing properly..so
tiny
model works (no OOM)CT2_CUDA_ALLOW_BF16=1 CT2_CUDA_ALLOW_FP16=1
I could getsmall
to run successfully on this GPU (!)