Closed jwijffels closed 8 months ago
We are now at the state where the following should at least work once we integrate https://github.com/ggerganov/whisper.cpp/blob/master/Makefile#L210-L222 in Makevars and compile the cuda parts
library(audio.whisper)
path <- system.file(package = "audio.whisper", "repo", "ggml-tiny.en-q5_1.bin")
model <- whisper(path, use_gpu = TRUE)
ifdef WHISPER_CUBLAS
ifeq ($(shell expr $(NVCC_VERSION) \>= 11.6), 1)
CUDA_ARCH_FLAG=native
else
CUDA_ARCH_FLAG=all
endif
CFLAGS += -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I$(CUDA_PATH)/targets/$(UNAME_M)-linux/include
CXXFLAGS += -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I$(CUDA_PATH)/targets/$(UNAME_M)-linux/include
LDFLAGS += -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L$(CUDA_PATH)/targets/$(UNAME_M)-linux/lib
WHISPER_OBJ += ggml-cuda.o
NVCC = nvcc
NVCCFLAGS = --forward-unknown-to-host-compiler -arch=$(CUDA_ARCH_FLAG)
ggml-cuda.o: ggml-cuda.cu ggml-cuda.h
$(NVCC) $(NVCCFLAGS) $(CXXFLAGS) -Wno-pedantic -c $< -o $@
endif
Just a heads up that I have access to a Windows machine with CUDA in case it would be helpful for me to do tests and benchmarks like I did for MacOS.
That would indeed be great. Could you show the compilation trace when installing the package already? Is CUDA_PATH somehow set on that machine? I see in the makevars that it links to /usr/local/cuda/include, /opt/cuda/include, /opt/cuda/lib64 do these paths uberhaupt exist on your machine or where are these located?
OS Name: Microsoft Windows 10 Enterprise Version: 10.0.19044 Build 19044 Processor: Intel(R) Xeon(R) CPU E5-1620 v2 @ 3.70GHz, 3701 Mhz, 4 Core(s), 8 Logical Processor(s) GPU: NVIDIA GeForce RTX 3050 GPU Driver: 31.0.15.4584
What does which nvcc
and nvcc --version
give on that machine when executing from a shell?
> which nvcc
/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.2/bin/nvcc
>nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:09:35_Pacific_Daylight_Time_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0
Can you show all files which are (recursively) at /c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.2. Something like list.files("/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.2", recursive=TRUE, full.names = TRUE)
Does pkg-config --cflags cuda
run from a shell provide something? Or something similar then ´cuda´ like nvidia-cuda-toolkit
See attached for full list of files files.csv
> pkg-config --cflags cuda
Package cuda was not found in the pkg-config search path.
Perhaps you should add the directory containing `cuda.pc'
to the PKG_CONFIG_PATH environment variable
Package 'cuda', required by 'virtual:world', not found
Note as well that I have system variables for CUDA_PATH
and CUDA_PATH_V2_2
, both set to "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2". This same path's "bin" and "libnvvp" subdirectories are also added to the PATH
system variable.
Thanks. With that information, I think I can add something to the Makevars to compile the cuda source code.
So does pkg-config --list-all
provide something like cuda or nvidia?
pkg-config --list-all | grep cuda
pkg-config --list-all | grep nvidia
pkg-config --libs cuda-12.2
There's nothing there with cuda, nvidia, or nv.
I've set up continuous integration with CUDA in the cuda
branch: https://github.com/bnosac/audio.whisper/tree/cuda
And after some trials and errors, managed to be able to compile the code against the cuda libraries.
You should be able to install it by setting environment variable WHISPER_CUBLAS
to 1 before doing the installation as shown below. That seems to work at least on Linux when looking at the CI run https://github.com/bnosac/audio.whisper/actions/runs/7964875388/job/21743294472
# Install with CUDA
Sys.setenv(WHISPER_CUBLAS = "1")
remotes::install_github("bnosac/audio.whisper@cuda", force = TRUE)
# Get audio file to transcribe
library(av)
download.file(url = "https://www.ubu.com/media/sound/dec_francis/Dec-Francis-E_rant1.mp3", destfile = "rant1.mp3", mode = "wb")
av_audio_convert("rant1.mp3", output = "output.wav", format = "wav", sample_rate = 16000)
# See how long it takes - with use_gpu = TRUE
library(audio.whisper)
model <- whisper("medium", use_gpu = TRUE)
trans <- predict(model, newdata = "output.wav", language = "en", n_threads = 1)
trans$timing
trans$data
On Windows it looks like the nvcc executable from CUDA needs cl.exe
which is part of Microsoft Visual Studio and needs to be in the PATH CI run shows: (nvcc fatal : Cannot find compiler 'cl.exe' in PATH).
I hope that by adding the visual_studio_integration in the continuous integration, the tests on Windows also turn green in but that is just continuous integration tests. Probably it will install already on your Windows machine.
I've now let it create a fat binary for all the gpu architectures which nvcc lists: nvcc --list-gpu-arch
.
Error when trying to install without visual studio
Trying to install MSVC now...
Yes, on Windows you need cl.exe
and the location of that file should be in the PATH.
cl.exe
(for mine it was in C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\bin\Hostx64\x64
)PATH
Yep, got similar errors in run https://github.com/bnosac/audio.whisper/actions/runs/7965676963/job/21745662097
These R_ext/Complex.h for cuda 11.8.0 are because I made sure printing goes to the R console instead of std::err 2 options come to mind
When disabling printing as indicated at https://github.com/bnosac/audio.whisper/issues/27#issuecomment-1953248799 The linker complains about missing culibos en missing rt for the Windows build (log at https://github.com/bnosac/audio.whisper/actions/runs/7971359508/job/21760868576) This seems to be same as https://forums.developer.nvidia.com/t/missing-lib-files-culibos-dl-rt-for-cublas/276707/2 (where the NVIDIA moderator also mentions that mingw is not a host compiler you can use on windows for CUDA - so that leaves only option 2.)
g++ -shared -s -static-libgcc -o audio.whisper.dll tmp.def whisper_cpp/ggml-quants.o whisper_cpp/ggml-backend.o whisper_cpp/ggml-alloc.o whisper_cpp/ggml.o whisper_cpp/whisper.o whisper_cpp/common-ggml.o whisper_cpp/common.o rcpp_whisper.o RcppExports.o whisper_cpp/ggml-cuda.o -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -LC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8/lib/x64 -LC:/rtools43/x86_64-w64-mingw32.static.posix/lib/x64 -LC:/rtools43/x86_64-w64-mingw32.static.posix/lib -LC:/R/bin/x64 -lR
C:\rtools43\x86_64-w64-mingw32.static.posix\bin/ld.exe: cannot find -lculibos: No such file or directory
C:\rtools43\x86_64-w64-mingw32.static.posix\bin/ld.exe: cannot find -lrt: No such file or directory
Notes when spinning up a p3.2xlarge (Tesla V100, 16GB GPU RAM - NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB]) on AWS.
ubuntu@ip-172-31-39-81:~$ nvidia-smi
Thu Feb 29 15:04:44 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla V100-SXM2-16GB Off | 00000000:00:1E.0 Off | 0 |
| N/A 30C P0 23W / 300W | 0MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
ubuntu@ip-172-31-39-81:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Wed_Nov_22_10:17:15_PST_2023
Cuda compilation tools, release 12.3, V12.3.107
Build cuda_12.3.r12.3/compiler.33567101_0
CUDA_PATH
and indicate to use CUDA with environment variable WHISPER_CUBLAS
Sys.setenv(CUDA_PATH = "/usr/local/cuda-12.3")
Sys.setenv(WHISPER_CUBLAS = "1")
remotes::install_github("bnosac/audio.whisper@cuda", force = TRUE)
> Sys.setenv(CUDA_PATH = "/usr/local/cuda-12.3")
>
> Sys.setenv(WHISPER_CUBLAS = "1")
>
> remotes::install_github("bnosac/audio.whisper@cuda", force = TRUE)
Downloading GitHub repo bnosac/audio.whisper@cuda
Running `R CMD build`...
* checking for file ‘/tmp/Rtmp6kBKov/remotes11f0c55856be5/bnosac-audio.whisper-9f30a55/DESCRIPTION’ ... OK
* preparing ‘audio.whisper’:
* checking DESCRIPTION meta-information ... OK
* cleaning src
* checking for LF line-endings in source and make files and shell scripts
* checking for empty or unneeded directories
* building ‘audio.whisper_0.3.2.tar.gz’
Installing package into ‘/home/ubuntu/R/x86_64-pc-linux-gnu-library/4.3’
(as ‘lib’ is unspecified)
* installing *source* package ‘audio.whisper’ ...
** using staged installation
** libs
using C++ compiler: ‘g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0’
using C++11
I whisper.cpp build info:
I UNAME_S: Linux
I UNAME_P: x86_64
I UNAME_M: x86_64
I PKG_CFLAGS: -O3 -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.3/include" -I"/usr/local/cuda-12.3/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread
I PKG_CPPFLAGS: -O3 -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.3/include" -I"/usr/local/cuda-12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread
I PKG_LIBS: -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L"/usr/local/cuda-12.3/lib64" -L/opt/cuda/lib64 -L"/usr/local/cuda-12.3/targets/x86_64-linux/lib"
gcc -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.3/include" -I"/usr/local/cuda-12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/home/ubuntu/R/x86_64-pc-linux-gnu-library/4.3/Rcpp/include' -O3 -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.3/include" -I"/usr/local/cuda-12.3/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c whisper_cpp/ggml-quants.c -o whisper_cpp/ggml-quants.o
gcc -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.3/include" -I"/usr/local/cuda-12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/home/ubuntu/R/x86_64-pc-linux-gnu-library/4.3/Rcpp/include' -O3 -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.3/include" -I"/usr/local/cuda-12.3/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c whisper_cpp/ggml-backend.c -o whisper_cpp/ggml-backend.o
gcc -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.3/include" -I"/usr/local/cuda-12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/home/ubuntu/R/x86_64-pc-linux-gnu-library/4.3/Rcpp/include' -O3 -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.3/include" -I"/usr/local/cuda-12.3/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c whisper_cpp/ggml-alloc.c -o whisper_cpp/ggml-alloc.o
gcc -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.3/include" -I"/usr/local/cuda-12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/home/ubuntu/R/x86_64-pc-linux-gnu-library/4.3/Rcpp/include' -O3 -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.3/include" -I"/usr/local/cuda-12.3/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c whisper_cpp/ggml.c -o whisper_cpp/ggml.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.3/include" -I"/usr/local/cuda-12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/home/ubuntu/R/x86_64-pc-linux-gnu-library/4.3/Rcpp/include' -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c whisper_cpp/whisper.cpp -o whisper_cpp/whisper.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.3/include" -I"/usr/local/cuda-12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/home/ubuntu/R/x86_64-pc-linux-gnu-library/4.3/Rcpp/include' -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c whisper_cpp/common-ggml.cpp -o whisper_cpp/common-ggml.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.3/include" -I"/usr/local/cuda-12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/home/ubuntu/R/x86_64-pc-linux-gnu-library/4.3/Rcpp/include' -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c whisper_cpp/common.cpp -o whisper_cpp/common.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.3/include" -I"/usr/local/cuda-12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/home/ubuntu/R/x86_64-pc-linux-gnu-library/4.3/Rcpp/include' -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c rcpp_whisper.cpp -o rcpp_whisper.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.3/include" -I"/usr/local/cuda-12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/home/ubuntu/R/x86_64-pc-linux-gnu-library/4.3/Rcpp/include' -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c RcppExports.cpp -o RcppExports.o
nvcc --forward-unknown-to-host-compiler -arch=all -O3 -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.3/include" -I"/usr/local/cuda-12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -fPIC -I/usr/lib/R/include -c whisper_cpp/ggml-cuda.cu -o whisper_cpp/ggml-cuda.o
g++ -std=gnu++11 -shared -L/usr/lib/R/lib -Wl,-Bsymbolic-functions -flto=auto -ffat-lto-objects -flto=auto -Wl,-z,relro -o audio.whisper.so whisper_cpp/ggml-quants.o whisper_cpp/ggml-backend.o whisper_cpp/ggml-alloc.o whisper_cpp/ggml.o whisper_cpp/whisper.o whisper_cpp/common-ggml.o whisper_cpp/common.o rcpp_whisper.o RcppExports.o whisper_cpp/ggml-cuda.o -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda-12.3/lib64 -L/opt/cuda/lib64 -L/usr/local/cuda-12.3/targets/x86_64-linux/lib -L/usr/lib/R/lib -lR
installing to /home/ubuntu/R/x86_64-pc-linux-gnu-library/4.3/00LOCK-audio.whisper/00new/audio.whisper/libs
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** checking absolute paths in shared objects and dynamic libraries
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (audio.whisper)
use_gpu = TRUE
> library(audio.whisper)
>
> model <- whisper("medium", use_gpu = TRUE)
whisper_init_from_file_with_params_no_state: loading model from '/home/ubuntu/whisper.cpp/ggml-medium.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 1024
whisper_model_load: n_text_head = 16
whisper_model_load: n_text_layer = 24
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 4 (medium)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs = 99
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
Device 0: Tesla V100-SXM2-16GB, compute capability 7.0, VMM: yes
whisper_backend_init: using CUDA backend
whisper_model_load: CUDA buffer size = 1533.52 MB
whisper_model_load: model size = 1533.14 MB
whisper_backend_init: using CUDA backend
whisper_init_state: kv self size = 132.12 MB
whisper_init_state: kv cross size = 147.46 MB
whisper_init_state: compute buffer (conv) = 25.61 MB
whisper_init_state: compute buffer (encode) = 170.28 MB
whisper_init_state: compute buffer (cross) = 7.85 MB
whisper_init_state: compute buffer (decode) = 98.32 MB
>
> trans <- predict(model, newdata = "example.wav", language = "en", n_threads = 4)
system_info: n_threads = 4 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0 |
Processing example.wav (4802560 samples, 300.16 sec), lang = en, translate = 0, timestamps = 0, beam_size = -1, best_of = 5
>
> trans$data
segment from to
1 1 00:00:00.000 00:00:02.000
2 2 00:00:02.000 00:00:21.000
3 3 00:00:21.000 00:00:30.000
4 4 00:00:30.000 00:00:40.000
5 5 00:00:40.000 00:00:54.000
6 6 00:00:54.000 00:01:10.000
7 7 00:01:10.000 00:01:18.000
8 8 00:01:18.000 00:01:37.000
9 9 00:01:37.000 00:02:00.000
10 10 00:02:00.000 00:02:20.000
11 11 00:02:20.000 00:02:36.000
12 12 00:02:36.000 00:02:49.000
13 13 00:02:49.000 00:03:12.000
14 14 00:03:12.000 00:03:30.000
15 15 00:03:30.000 00:03:51.000
16 16 00:03:51.000 00:04:10.000
17 17 00:04:10.000 00:04:32.000
18 18 00:04:32.000 00:04:52.000
19 19 00:04:52.000 00:05:00.000
text
1
Look at the picture.
2 See the skull, the part of bone removed, the master race Frankenstein radio controls, the brain thoughts broadcasting radio, the eyesight television, the Frankenstein earphone radio, the threshold brainwash radio, the latest new skull reforming to contain all Frankenstein controls.
3 Even in thin skulls of white pedigree males. Visible Frankenstein controls, the synthetic nerve radio directional antenna loop.
4 Make copies for yourself. There is no escape from this worse gangster police state using all of the deadly gangster Frankenstein controls.
5 In 1965 CIA gangster police beat me bloody, dragged me in chains from Kennedy New York airport. Since then I hide, enforce jobless poverty, isolated, alone in this low deadly nigger town old house.
6 The brazen, deadly gangster police and nigger puppet underlings spray me with poison nerve gas from automobile exhausts and even lawn mowers. Deadly assaults even in my yard with knives, even bricks and stones, even deadly touch table or electric shock flashlights.
7 Even remote electronically controlled around corners projection of deadly touch tarantulas fighters or even bloody murder accidents.
8 To shut me up forever with a sneak undetectable extermination even with trained parroting puppet assassins in maximum security insanity prisons for writing these unforgivable truths until my undetectable extermination I, Francis E. Deck Esquire, 29 Maple Avenue, Hempstead, New York.
9 I stand alone against your mad, deadly, worldwide, conspiratorial, gangster, computer god communism with wall to wall deadly gangster protection, lifelong sworn conspirators, murder incorporated organized crime, the police and judges, the deadly sneak parroting puppet gangsters using all the gangster deadly Frankenstein controls.
10 These hangman rope sneak deadly gangsters, the judges and the police trick, trap, rob, wreck, butcher and murder the people to keep them terrorized in gangster Frankenstein earphone radio slavery for the communist gangster government and con artists parroting puppet gangster playboy scum on top.
11 The secret work of all police in order to maintain a communist closed society, the same worldwide mad deadly communist gangster computer god that controls you as a terrorized gangster Frankenstein earphone radio slave parroting puppet.
12 You are a terrorized member of the master
race worldwide, four billion eyesight television camera guinea pig communist gangster computer god master race.
13 You're living thinking mad deadly worldwide communist gangster computer god secret overall plan worldwide living death Frankenstein slavery to explore and control the entire universe with the endless stairway to the stars, namely the man made inside out planets with nucleonic powered speeds much faster than the speed of light.
14 Look up and see the gangster computer god concocted new fake starry sky, the worldwide completely controlled deadly degenerative climate and atmosphere to the new world round translucent exotic gaseous envelope, which the worldwide communist gangster computer god manipulates through
15 the countless exactly positioned satellites, the new fake phony stars in the synthetic sky for ages before Frankenstein controls a poetic niggers interbreedable with eight had no alphabet
not even numerals slavery conspiracy over 300 years ago, ideally tiny brain a poetic nigger gangster government
16 TV gangster spy cameras computer god new world order degeneration with gifted with all gangster Frankenstein controls nigger deadly gangster parroting puppets or nigger
brain programmed robots deadly eight Frankenstein machines degenerative disease to eternal Frankenstein slavery
17 overall plan through one world communism top secret code word, meaning worldwide absolutely helpless and hopeless simple language mongrel mulatto a poetic niggers worldwide systematic instant plastic surgery butchery murder fake aging so all people are dead or useless by eight 70 done at night to use a Frankenstein
18 slave parroting puppet gangster slave now even you know, I am a menace to your worldwide mad deadly communist gangster computer god. Therefore, I must go to extermination before I am exterminated by this gangster computer god concocted and controlled worst mongrel organized crime murder
19
gangster communist government. I hand you the secrets to save the entire human race and the entire human race.
>
> trans$timing
$transcription_start
[1] "2024-02-29 16:47:09 UTC"
$transcription_end
[1] "2024-02-29 16:47:24 UTC"
$transcription_duration
Time difference of 0.2399666 mins
Don't forgot to set use_gpu = TRUE
quick check to see if whisper.cpp provides the same information
ubuntu@ip-172-31-39-81:~/whisper.cpp$ WHISPER_CUBLAS=1 make -j
I whisper.cpp build info:
I UNAME_S: Linux
I UNAME_P: x86_64
I UNAME_M: x86_64
I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include
I LDFLAGS: -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib -L/usr/lib/wsl/lib
I CC: cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
I CXX: g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
nvcc --forward-unknown-to-host-compiler -arch=native -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -Wno-pedantic -c ggml-cuda.cu -o ggml-cuda.o
cc -I. -O3 -DNDEBUG -std=c11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -c ggml.c -o ggml.o
cc -I. -O3 -DNDEBUG -std=c11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -c ggml-alloc.c -o ggml-alloc.o
cc -I. -O3 -DNDEBUG -std=c11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -c ggml-backend.c -o ggml-backend.o
cc -I. -O3 -DNDEBUG -std=c11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -c ggml-quants.c -o ggml-quants.o
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -c whisper.cpp -o whisper.o
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include examples/main/main.cpp examples/common.cpp examples/common-ggml.cpp ggml-cuda.o ggml.o ggml-alloc.o ggml-backend.o ggml-quants.o whisper.o -o main -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib -L/usr/lib/wsl/lib
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include examples/bench/bench.cpp ggml-cuda.o ggml.o ggml-alloc.o ggml-backend.o ggml-quants.o whisper.o -o bench -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib -L/usr/lib/wsl/lib
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include examples/quantize/quantize.cpp examples/common.cpp examples/common-ggml.cpp ggml-cuda.o ggml.o ggml-alloc.o ggml-backend.o ggml-quants.o whisper.o -o quantize -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib -L/usr/lib/wsl/lib
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include examples/server/server.cpp examples/common.cpp examples/common-ggml.cpp ggml-cuda.o ggml.o ggml-alloc.o ggml-backend.o ggml-quants.o whisper.o -o server -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib -L/usr/lib/wsl/lib
./main -h
Same timing with whisper.cpp
ubuntu@ip-172-31-39-81:~/whisper.cpp$ ./main -m ggml-medium.bin --threads 4 --language en -f example.wav
whisper_init_from_file_with_params_no_state: loading model from 'ggml-medium.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 1024
whisper_model_load: n_text_head = 16
whisper_model_load: n_text_layer = 24
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 4 (medium)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs = 99
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
Device 0: Tesla V100-SXM2-16GB, compute capability 7.0, VMM: yes
whisper_backend_init: using CUDA backend
whisper_model_load: CUDA0 total size = 1533.14 MB
whisper_model_load: model size = 1533.14 MB
whisper_backend_init: using CUDA backend
whisper_init_state: kv self size = 132.12 MB
whisper_init_state: kv cross size = 147.46 MB
whisper_init_state: compute buffer (conv) = 28.68 MB
whisper_init_state: compute buffer (encode) = 594.22 MB
whisper_init_state: compute buffer (cross) = 7.85 MB
whisper_init_state: compute buffer (decode) = 138.87 MB
system_info: n_threads = 4 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0 |
main: processing 'example.wav' (4802560 samples, 300.2 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...
[00:00:00.000 --> 00:00:02.560] Look at the picture.
[00:00:02.560 --> 00:00:06.120] See the skull, the part of bone removed.
[00:00:06.120 --> 00:00:09.120] The master race Frankenstein radio controls.
[00:00:09.120 --> 00:00:11.200] The brain thoughts broadcasting radio.
[00:00:11.200 --> 00:00:12.600] The eyesight television.
[00:00:12.600 --> 00:00:14.480] The Frankenstein earphone radio.
[00:00:14.480 --> 00:00:16.440] The threshold brainwash radio.
[00:00:16.440 --> 00:00:20.680] The latest new skull reforming to contain all Frankenstein
[00:00:20.680 --> 00:00:25.000] controls, even in thin skulls of white pedigree males.
[00:00:25.000 --> 00:00:27.240] Visible Frankenstein controls.
[00:00:27.240 --> 00:00:30.600] The synthetic nerve radio directional antenna loop.
[00:00:30.600 --> 00:00:32.240] Make copies for yourself.
[00:00:32.240 --> 00:00:37.000] There is no escape from this worse gangster police state
[00:00:37.000 --> 00:00:40.320] using all of the deadly gangster Frankenstein controls.
[00:00:40.320 --> 00:00:44.600] In 1965, CIA gangster police beat me bloody,
[00:00:44.600 --> 00:00:47.360] dragged me in chains from Kennedy New York Airport.
[00:00:47.360 --> 00:00:51.120] Since then, I hide in forced jobless poverty, isolated,
[00:00:51.120 --> 00:00:54.440] alone in this low deadly nigger town old house.
[00:00:54.440 --> 00:00:57.560] The brazen, deadly gangster police and nigger puppet
[00:00:57.560 --> 00:00:59.640] underlings spray me with poison nerve
[00:00:59.640 --> 00:01:02.720] gas from automobile exhausts and even lawnmowers.
[00:01:02.720 --> 00:01:06.480] Deadly assaults even in my yard with knives, even bricks
[00:01:06.480 --> 00:01:09.600] and stones, even deadly touch pavement or electric shock
[00:01:09.600 --> 00:01:12.480] flashlights, even remote electronically controlled
[00:01:12.480 --> 00:01:14.720] around corners projection of deadly touch
[00:01:14.720 --> 00:01:18.760] tarantulas fighters, or even bloody murder accidents
[00:01:18.760 --> 00:01:21.080] to shut me up forever with a sneak undetectable
[00:01:21.080 --> 00:01:24.600] extermination, even with trained parroting puppet assassins
[00:01:24.600 --> 00:01:27.560] in maximum security insanity prisons for writing
[00:01:27.560 --> 00:01:32.000] these unforgivable truths until my undetectable extermination.
[00:01:32.000 --> 00:01:37.080] I, Francis E. Deck Esquire, 29 Maple Avenue, Hempstead, New
[00:01:37.080 --> 00:01:41.880] York, I stand alone against your mad, deadly, worldwide
[00:01:41.880 --> 00:01:45.560] conspiratorial gangster computer god communism
[00:01:45.560 --> 00:01:48.760] with wall to wall deadly gangster protection,
[00:01:48.760 --> 00:01:51.520] lifelong sworn conspirators, murder
[00:01:51.520 --> 00:01:54.680] incorporated organized crime, the police and judges,
[00:01:54.680 --> 00:01:57.920] the deadly sneak parroting puppet gangsters
[00:01:57.920 --> 00:02:00.480] using all the gangster deadly Frankenstein control.
[00:02:00.480 --> 00:02:03.960] These hangman rope sneak deadly gangsters, the judges
[00:02:03.960 --> 00:02:08.720] and the police trick, trap, rob, wreck, butcher, and murder
[00:02:08.720 --> 00:02:10.680] the people that keep them terrorized
[00:02:10.680 --> 00:02:13.320] in gangster Frankenstein earphone radio
[00:02:13.320 --> 00:02:16.120] slavery for the communist gangster government
[00:02:16.120 --> 00:02:18.400] and con artist parroting puppet gangster
[00:02:18.400 --> 00:02:20.560] playboy scum on top.
[00:02:20.560 --> 00:02:23.200] The secret work of all police in order
[00:02:23.200 --> 00:02:26.040] to maintain a communist closed society,
[00:02:26.040 --> 00:02:29.920] the same worldwide mad deadly communist gangster computer
[00:02:29.920 --> 00:02:33.360] god that controls you as a terrorized gangster
[00:02:33.360 --> 00:02:36.720] Frankenstein earphone radio slave parroting puppet.
[00:02:36.720 --> 00:02:42.040] You are a terrorized member of the master race worldwide.
[00:02:42.040 --> 00:02:46.000] Four billion eyesight television camera guinea pig communist
[00:02:46.000 --> 00:02:51.160] gangster computer god master race, your living, thinking,
[00:02:51.160 --> 00:02:54.640] mad deadly worldwide communist gangster computer god
[00:02:54.640 --> 00:02:59.120] secret overall plan, worldwide living death Frankenstein
[00:02:59.120 --> 00:03:02.560] slavery to explore and control the entire universe
[00:03:02.560 --> 00:03:04.960] with the endless stairway to the stars,
[00:03:04.960 --> 00:03:07.960] namely the man-made inside out planets
[00:03:07.960 --> 00:03:10.400] with nucleonic powered speeds much faster
[00:03:10.400 --> 00:03:11.880] than the speed of light.
[00:03:11.880 --> 00:03:14.960] Look up and see the gangster computer god concocted
[00:03:14.960 --> 00:03:18.680] new fake starry sky, the worldwide completely controlled
[00:03:18.680 --> 00:03:21.040] deadly degenerative climate and atmosphere
[00:03:21.040 --> 00:03:24.680] through the new world round translucent exotic gaseous
[00:03:24.680 --> 00:03:28.160] envelope, which the worldwide communist gangster computer god
[00:03:28.160 --> 00:03:32.240] manipulates through countless exactly positioned satellites,
[00:03:32.240 --> 00:03:36.720] the new fake phony stars in the synthetic sky.
[00:03:36.720 --> 00:03:39.240] For ages before Frankenstein controls
[00:03:39.240 --> 00:03:42.600] apoetic niggers interbreedable with apes had no alphabet,
[00:03:42.600 --> 00:03:47.080] not even numerals, slavery conspiracy over 300 years ago,
[00:03:47.080 --> 00:03:50.480] ideally tiny brain apoetic nigger gangster government
[00:03:50.480 --> 00:03:54.560] eyesight TV gangster spy cameras, computer god new world order
[00:03:54.560 --> 00:03:59.280] degeneration gifted with all gangster Frankenstein controls
[00:03:59.280 --> 00:04:02.200] nigger deadly gangster parroting puppets or nigger brain
[00:04:02.200 --> 00:04:05.640] programmed robots, deadly ape Frankenstein machines,
[00:04:05.640 --> 00:04:09.720] degenerative disease to eternal Frankenstein slavery,
[00:04:09.720 --> 00:04:14.280] overall plan through one world communism, top secret code word,
[00:04:14.280 --> 00:04:17.240] meaning worldwide absolutely helpless and hopeless
[00:04:17.240 --> 00:04:21.280] simple language mongrel mulatto apoetic niggers.
[00:04:21.280 --> 00:04:25.120] Worldwide systematic instant plastic surgery butchery murder,
[00:04:25.120 --> 00:04:29.560] fake aging so all people are dead or useless by age 70,
[00:04:29.560 --> 00:04:32.440] done at night to you as a Frankenstein slave
[00:04:32.440 --> 00:04:34.120] parroting puppet gangster slave.
[00:04:34.120 --> 00:04:39.120] Now even you know I am a menace to your worldwide mad
[00:04:39.120 --> 00:04:41.560] deadly communist gangster computer god,
[00:04:41.560 --> 00:04:44.000] therefore I must go to extermination.
[00:04:44.000 --> 00:04:47.480] Before I am exterminated by this gangster computer god
[00:04:47.480 --> 00:04:51.240] concocted and controlled, worst mongrel organized crime murder
[00:04:51.240 --> 00:04:53.840] incorporated gangster communist government,
[00:04:53.840 --> 00:04:59.040] I hand you the secrets to save the entire human race
[00:04:59.040 --> 00:05:00.040] and the entire world.
whisper_print_timings: load time = 913.76 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 399.31 ms
whisper_print_timings: sample time = 3201.01 ms / 5230 runs ( 0.61 ms per run)
whisper_print_timings: encode time = 305.30 ms / 11 runs ( 27.75 ms per run)
whisper_print_timings: decode time = 68.82 ms / 11 runs ( 6.26 ms per run)
whisper_print_timings: batchd time = 9215.10 ms / 5167 runs ( 1.78 ms per run)
whisper_print_timings: prompt time = 874.47 ms / 2122 runs ( 0.41 ms per run)
whisper_print_timings: total time = 15031.71 ms
So conclusion: Tesla V100, 16GB GPU RAM.
So CUDA with R works on this Linux machine
@jmgirard I think I'll include already the changes that allow to install and transcribe on Linux with CUDA as that works.
Will try later to see if we can make it work on Windows. But apparently it needs to link to culibos and rt (see https://github.com/bnosac/audio.whisper/issues/27#issuecomment-1954198878) and I don't know if that is uberhaupt available when you install the CUDA drivers on Windows. At least it is not available on the continuous integration run under C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8/lib/x64 (maybe it is at your machine?)
> list.files("C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.3/lib/x64")
[1] "cublas.lib" "cublasLt.lib"
[3] "cuda.lib" "cudadevrt.lib"
[5] "cudart.lib" "cudart_static.lib"
[7] "cufft.lib" "cufftw.lib"
[9] "cufilt.lib" "curand.lib"
[11] "cusolver.lib" "cusolverMg.lib"
[13] "cusparse.lib" "nppc.lib"
[15] "nppial.lib" "nppicc.lib"
[17] "nppidei.lib" "nppif.lib"
[19] "nppig.lib" "nppim.lib"
[21] "nppist.lib" "nppisu.lib"
[23] "nppitc.lib" "npps.lib"
[25] "nvblas.lib" "nvJitLink.lib"
[27] "nvJitLink_static.lib" "nvjpeg.lib"
[29] "nvml.lib" "nvptxcompiler_static.lib"
[31] "nvrtc-builtins_static.lib" "nvrtc.lib"
[33] "nvrtc_static.lib" "OpenCL.lib"
I've enabled CUDA integration on the master branch for Linux.
I've enabled CUDA integration on the master branch for Linux.
I wonder if this would work on Windows via WSL2?
I've enabled CUDA integration on the master branch for Linux.
I wonder if this would work on Windows via WSL2?
culibos and rt is clearly not on your machine
The relevant part of the compilation is here: https://github.com/bnosac/audio.whisper/blob/master/src/Makevars#L152-L161
I looked to the latest changes in the Makevars on whisper.cpp and they link to -L/usr/lib/wsl/lib, I've added that on the master branch. Maybe that allows to run it at WSL2. Would be cool if you could test that. I've added the installations which I did on that Tesla V100 machine on AWS at the end of https://github.com/bnosac/audio.whisper/issues/27#issuecomment-1971417075
I'm not sure I have admin permissions required to enable WSL on my work computer (unfortunately my home PC does not have an RTX) but I will try and let you know.
Ok, I was able to get WSL going on my work computer. I installed Ubuntu Jammy Jellyfish.
1) Install NVIDIA Graphics Driver on Windows via link (but not the CUDA toolkit)
2) Install "Windows Subsystem for Linux" from Microsoft Store
3) Open CMD/PowerShell/Terminal and update WSL via wsl --update
4) Install Ubuntu via wsl --install Ubuntu
5) Set up Ubuntu user/password and start Ubuntu (e.g., via wsl
if necessary)
6) Install CUDA Toolkit for WSL-Ubuntu via:
wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.4.0/local_installers/cuda-repo-wsl-ubuntu-12-4-local_12.4.0-1_amd64.deb
sudo dpkg -i cuda-repo-wsl-ubuntu-12-4-local_12.4.0-1_amd64.deb
sudo cp /var/cuda-repo-wsl-ubuntu-12-4-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-4
7) Install R on Ubuntu via:
wget -qO- https://cloud.r-project.org/bin/linux/ubuntu/marutter_pubkey.asc | sudo tee -a /etc/apt/trusted.gpg.d/cran_ubuntu_key.asc
sudo add-apt-repository "deb https://cloud.r-project.org/bin/linux/ubuntu $(lsb_release -cs)-cran40/"
sudo apt install --no-install-recommends r-base
8) Install Packages Used in Many R Packages via:
sudo apt-get install libcurl4-openssl-dev libssl-dev libxml2-dev libudunits2-dev libgdal-dev cargo libfontconfig1-dev libcairo2-dev
9) Install Devtools from c2d4u4.0+ via:
sudo add-apt-repository ppa:c2d4u.team/c2d4u4.0+
sudo apt upgrade
sudo apt install --no-install-recommends r-cran-devtools
10) Add CUDA Toolkit to PATH via:
a) Open .bashrc for editing via nano /home/$USER/.bashrc
b) Move to bottom with arrow keys and add:
export PATH="/usr/local/cuda-12.4/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda-12.4/lib64:$LD_LIBRARY_PATH"
c) Save changes with CTRL+O, then ENTER
d) Exit file editing with CTRL+X
e) Restart Ubuntu via sudo reboot
f) Enter Ubuntu again with wsl
g) Check that nvcc can be found via which nvcc
-- https://docs.nvidia.com/cuda/wsl-user-guide/index.html -- https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=WSL-Ubuntu&target_version=2.0&target_type=deb_local -- https://askubuntu.com/questions/885610/nvcc-version-command-says-nvcc-is-not-installed
Now to try to install audio.whisper:
1) Start R within Ubuntu via sudo R
2) Configure CUDA_PATH via Sys.setenv(CUDA_PATH = "/usr/local/cuda-12.4")
3) Configure WHISPER_CUBLAS via Sys.setenv(WHISPER_CUBLAS = "1")
4) Force install cuda branch via remotes::install_github("bnosac/audio.whisper@cuda", force = TRUE)
Trace:
> Sys.setenv(CUDA_PATH = "/usr/local/cuda-12.4")
> Sys.setenv(WHISPER_CUBLAS = "1")
> remotes::install_github("bnosac/audio.whisper@cuda", force = TRUE)
Downloading GitHub repo bnosac/audio.whisper@cuda
── R CMD build ───────────────────────────────────────────────────────────────────────────────────────────────────────
✔ checking for file ‘/tmp/Rtmp6D5gcu/remotes4335ffaea6f/bnosac-audio.whisper-9f30a55/DESCRIPTION’ ...
─ preparing ‘audio.whisper’:
✔ checking DESCRIPTION meta-information ...
─ cleaning src
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
─ building ‘audio.whisper_0.3.2.tar.gz’
Installing package into ‘/home/jmgirard/R/x86_64-pc-linux-gnu-library/4.3’
(as ‘lib’ is unspecified)
* installing *source* package ‘audio.whisper’ ...
** using staged installation
** libs
expr: syntax error: unexpected argument ‘11.6’
using C++ compiler: ‘g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0’
using C++11
expr: syntax error: unexpected argument ‘11.6’
I whisper.cpp build info:
I UNAME_S: Linux
I UNAME_P: x86_64
I UNAME_M: x86_64
I PKG_CFLAGS: -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread
I PKG_CPPFLAGS: -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread
I PKG_LIBS: -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L"/usr/local/cuda-12.4/lib64" -L/opt/cuda/lib64 -L"/usr/local/cuda-12.4/targets/x86_64-linux/lib"
gcc -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include' -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -fpic -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c whisper_cpp/ggml-quants.c -o whisper_cpp/ggml-quants.o
gcc -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include' -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -fpic -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c whisper_cpp/ggml-backend.c -o whisper_cpp/ggml-backend.o
gcc -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include' -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -fpic -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c whisper_cpp/ggml-alloc.c -o whisper_cpp/ggml-alloc.o
gcc -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include' -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -fpic -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c whisper_cpp/ggml.c -o whisper_cpp/ggml.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include' -fpic -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c whisper_cpp/whisper.cpp -o whisper_cpp/whisper.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include' -fpic -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c whisper_cpp/common-ggml.cpp -o whisper_cpp/common-ggml.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include' -fpic -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c whisper_cpp/common.cpp -o whisper_cpp/common.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include' -fpic -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c rcpp_whisper.cpp -o rcpp_whisper.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include' -fpic -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c RcppExports.cpp -o RcppExports.o
nvcc --forward-unknown-to-host-compiler -arch=all -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -fPIC -I/usr/lib/R/include -c whisper_cpp/ggml-cuda.cu -o whisper_cpp/ggml-cuda.o
/bin/bash: line 1: nvcc: command not found
make: *** [Makevars:298: whisper_cpp/ggml-cuda.o] Error 127
ERROR: compilation failed for package ‘audio.whisper’
* removing ‘/home/jmgirard/R/x86_64-pc-linux-gnu-library/4.3/audio.whisper’
Warning message:
In i.p(...) :
installation of package ‘/tmp/Rtmp6D5gcu/file43368c88092/audio.whisper_0.3.2.tar.gz’ had non-zero exit status
Not sure why it isn't finding nvcc
... I get this back from the Terminal:
jmgirard@PSYC-7PBFM02:~$ which nvcc
/usr/local/cuda-12.4/bin/nvcc
jmgirard@PSYC-7PBFM02:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Feb_27_16:19:38_PST_2024
Cuda compilation tools, release 12.4, V12.4.99
Build cuda_12.4.r12.4/compiler.33961263_0
Brave 👍 going down the rabbit hole of installing NVIDIA drivers and CUDA.
remotes::install_github("bnosac/audio.whisper", ref = "0.3.2", force = TRUE)
should work or remotes::install_github("bnosac/audio.whisper", force = TRUE)
(I plan te remove branch cuda later on)Sys.getenv("PATH")
- can you show me what is in there?# If nvcc is not in the path you could make sure from R it is there.
Sys.setenv(PATH = sprintf("%s:/usr/local/cuda-12.4/bin", Sys.getenv("PATH")))
Sys.setenv(CUDA_PATH = "/usr/local/cuda-12.4")
Sys.setenv(WHISPER_CUBLAS = "1")
remotes::install_github("bnosac/audio.whisper", force = TRUE)
I guess it's not really in path:
> Sys.getenv("PATH")
[1] "/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin:/usr/lib/rstudio-server/bin/quarto/bin:/usr/lib/rstudio-server/bin/postback:/usr/lib/rstudio-server/bin/postback"
Here is a successful (!) trace:
> Sys.setenv(PATH = sprintf("%s:/usr/local/cuda-12.4/bin", Sys.getenv("PATH")))
> Sys.setenv(CUDA_PATH = "/usr/local/cuda-12.4")
> Sys.setenv(WHISPER_CUBLAS = "1")
> remotes::install_github("bnosac/audio.whisper", ref = "0.3.2", force = TRUE)
Downloading GitHub repo bnosac/audio.whisper@0.3.2
── R CMD build ────────────────────────────────────────────────────────────────────────────────────────────────────────
✔ checking for file ‘/tmp/RtmpEPMVPm/remotes1f36f9bca10/bnosac-audio.whisper-8d57d02/DESCRIPTION’ ...
─ preparing ‘audio.whisper’:
✔ checking DESCRIPTION meta-information ...
─ cleaning src
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
─ building ‘audio.whisper_0.3.2.tar.gz’
Installing package into ‘/home/jmgirard/R/x86_64-pc-linux-gnu-library/4.3’
(as ‘lib’ is unspecified)
* installing *source* package ‘audio.whisper’ ...
** using staged installation
** libs
using C++ compiler: ‘g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0’
using C++11
I whisper.cpp build info:
I UNAME_S: Linux
I UNAME_P: x86_64
I UNAME_M: x86_64
I PKG_CFLAGS: -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread
I PKG_CPPFLAGS: -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread
I PKG_LIBS: -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L"/usr/local/cuda-12.4/lib64" -L/opt/cuda/lib64 -L"/usr/local/cuda-12.4/targets/x86_64-linux/lib" -L/usr/lib/wsl/lib
gcc -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include' -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -fpic -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c whisper_cpp/ggml-quants.c -o whisper_cpp/ggml-quants.o
gcc -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include' -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -fpic -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c whisper_cpp/ggml-backend.c -o whisper_cpp/ggml-backend.o
gcc -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include' -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -fpic -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c whisper_cpp/ggml-alloc.c -o whisper_cpp/ggml-alloc.o
gcc -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include' -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -fpic -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c whisper_cpp/ggml.c -o whisper_cpp/ggml.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include' -fpic -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c whisper_cpp/whisper.cpp -o whisper_cpp/whisper.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include' -fpic -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c whisper_cpp/common-ggml.cpp -o whisper_cpp/common-ggml.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include' -fpic -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c whisper_cpp/common.cpp -o whisper_cpp/common.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include' -fpic -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c rcpp_whisper.cpp -o rcpp_whisper.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include' -fpic -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c RcppExports.cpp -o RcppExports.o
nvcc --forward-unknown-to-host-compiler -arch=native -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I"/usr/share/R/include" -fPIC -c whisper_cpp/ggml-cuda.cu -o whisper_cpp/ggml-cuda.o
g++ -std=gnu++11 -shared -L/usr/lib/R/lib -Wl,-Bsymbolic-functions -flto=auto -ffat-lto-objects -flto=auto -Wl,-z,relro -o audio.whisper.so whisper_cpp/ggml-quants.o whisper_cpp/ggml-backend.o whisper_cpp/ggml-alloc.o whisper_cpp/ggml.o whisper_cpp/whisper.o whisper_cpp/common-ggml.o whisper_cpp/common.o rcpp_whisper.o RcppExports.o whisper_cpp/ggml-cuda.o -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda-12.4/lib64 -L/opt/cuda/lib64 -L/usr/local/cuda-12.4/targets/x86_64-linux/lib -L/usr/lib/wsl/lib -L/usr/lib/R/lib -lR
installing to /home/jmgirard/R/x86_64-pc-linux-gnu-library/4.3/00LOCK-audio.whisper/00new/audio.whisper/libs
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** checking absolute paths in shared objects and dynamic libraries
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (audio.whisper)
> Sys.setenv(PATH = sprintf("%s:/usr/local/cuda-12.4/bin", Sys.getenv("PATH")))
> Sys.setenv(CUDA_PATH = "/usr/local/cuda-12.4")
> Sys.setenv(WHISPER_CUBLAS = "1")
> remotes::install_github("bnosac/audio.whisper", ref = "0.3.2", force = TRUE)
>
> library(av)
> download.file(url = "https://www.ubu.com/media/sound/dec_francis/Dec-Francis-E_rant1.mp3", destfile = "rant1.mp3", mode = "wb")
> av_audio_convert("rant1.mp3", output = "output.wav", format = "wav", sample_rate = 16000)
>
> library(audio.whisper)
> model <- whisper("medium", use_gpu = TRUE)
> trans <- predict(model, newdata = "output.wav", language = "en", n_threads = 1)
> trans$timing
$transcription_start
[1] "2024-03-06 09:08:29 CST"
$transcription_end
[1] "2024-03-06 09:09:12 CST"
$transcription_duration
Time difference of 0.7129008 mins
0.71 min (Use GPU = TRUE, n_threads = 1) 17.71 min (Use GPU = FALSE, n_threads = 1) 0.71 min (Use GPU = TRUE, n_threads = 4)
For large-v3 with CUDA, I complete the above 0.94 min. But for large-v3-q5_0 with CUDA, I get 34.96 min. I thought the point of quantized models was to be faster/more efficient? Do I need to install something else (e.g., ONNX) to unlock this benefit of quantized models?
Good to see that CUDA works on WSL as well and thanks for listing up the steps exactly what you did.
I've asksed at whisper.cpp to see if I could directly compile it alongside the Rtools toolchain (see https://github.com/ggerganov/whisper.cpp/issues/1922) but I think currently WSL seems to be the only way unless somehow we use cmake in the build process and rely on another compiler than R's default on Windows.
While you are timing things as well.
nvidia-smi
)library(audio.whisper)
model <- whisper("medium", use_gpu = TRUE)
download.file("https://github.com/jwijffels/example/raw/main/example.wav", "example.wav")
trans <- predict(model, newdata = "example.wav", language = "en", n_threads = 4)
trans$timing
trans <- predict(model, newdata = "example.wav", language = "en", n_threads = 4, n_processors = 4)
trans$timing
I can respond to the rest later today, but here is a quick answer:
jmgirard@PSYC-7PBFM02:/mnt/c/Users/j553g371$ nvidia-smi
Wed Mar 6 14:50:23 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.60.01 Driver Version: 551.76 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3050 On | 00000000:02:00.0 On | N/A |
| 60% 44C P8 15W / 130W | 7803MiB / 8192MiB | 4% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
> library(audio.whisper)
> model <- whisper("medium", use_gpu = TRUE)
whisper_init_from_file_with_params_no_state: loading model from '/home/jmgirard/ggml-medium.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 1024
whisper_model_load: n_text_head = 16
whisper_model_load: n_text_layer = 24
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 4 (medium)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs = 99
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3050, compute capability 8.6, VMM: yes
whisper_backend_init: using CUDA backend
whisper_model_load: CUDA buffer size = 1533.52 MB
whisper_model_load: model size = 1533.14 MB
whisper_backend_init: using CUDA backend
whisper_init_state: kv self size = 132.12 MB
whisper_init_state: kv cross size = 147.46 MB
whisper_init_state: compute buffer (conv) = 25.61 MB
whisper_init_state: compute buffer (encode) = 170.28 MB
whisper_init_state: compute buffer (cross) = 7.85 MB
whisper_init_state: compute buffer (decode) = 98.32 MB
> download.file("https://github.com/jwijffels/example/raw/main/example.wav", "example.wav")
trying URL 'https://github.com/jwijffels/example/raw/main/example.wav'
Content type 'application/octet-stream' length 9605198 bytes (9.2 MB)
==================================================
downloaded 9.2 MB
> trans <- predict(model, newdata = "example.wav", language = "en", n_threads = 4)
system_info: n_threads = 4 / 8 | AVX = 1 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0 |
Processing example.wav (4802560 samples, 300.16 sec), lang = en, translate = 0, timestamps = 0, beam_size = -1, best_of = 5
> trans$timing
$transcription_start
[1] "2024-03-06 14:56:13 CST"
$transcription_end
[1] "2024-03-06 14:56:34 CST"
$transcription_duration
Time difference of 0.3399141 mins
> trans <- predict(
model,
newdata = "example.wav",
language = "en",
n_threads = 4,
n_processors = 4
)
Error: C stack usage 746289549788 is too close to the limit
yes, probably that should be n_threads = 8, n_processors = 1
or n_threads = 4, n_processors = 2
, given that your system info shows system_info: n_threads = 4 / 8
But maybe this should be another github issue.
I'll probably close this issue as CUDA integration works, and I'll write some wrapup text here, in case someone else goes here.
Closing as CUDA integration is enabled on the master branch since audio.whisper version 0.3.2 and works for Linux and Windows Subsystem for Linux. As a wrapup:
To make sure it works, install cuda, the cuda toolkit and cuda-drivers for your system (visit the NVIDIA documentation). Examples given at https://github.com/bnosac/audio.whisper/issues/27#issuecomment-1971417075 (Ubuntu) and https://github.com/bnosac/audio.whisper/issues/27#issuecomment-1979792723 (WSL)
After you have installed CUDA, make sure the Nvidia cuda compiler nvcc is in your PATH and you can install the R package with.
Sys.setenv(WHISPER_CUBLAS = "1")
remotes::install_github("bnosac/audio.whisper", force = TRUE)
If you want to use your GPU when doing transcriptions, don't forget to set argument use_gpu otherwise your CPU will be used. E.g.
library(audio.whisper)
model <- whisper("medium", use_gpu = TRUE)
TODO
Next integrate https://github.com/ggerganov/whisper.cpp/blob/master/Makefile#L210-L222 in Makevars