PABannier / bark.cpp

Suno AI's Bark model in C/C++ for fast text-to-speech
MIT License
630 stars 48 forks source link

example : add WASM example #155

Closed PABannier closed 2 months ago

PABannier commented 2 months ago

Close #154

PABannier commented 2 months ago

The forward pass is taking a very long time. The generation speed to generate a token for Bark-small is 10x longer than when compiled in C++ (5ms vs. 65ms for the semantic encoder for instance). I passed the -O3 flag when compiling with Emscripten. I expected the WASM version to be slower than C++, but not by a factor of 10/20x.

@ggerganov Have you observed something similar with Whisper? Would you happen to have any ideas on speeding up the inference pass?

ggerganov commented 2 months ago

There are WASM_SIMD implementations only for the matrix multiplication op. If there are other ops in Bark that require significant compute, they might become bottleneck when using WASM

Overall, WASM performance is not great. For example, I compare whisper small encoder on my M1 Pro, using only CPU (i.e. without Metal and without Accelerate CBLAS) and the C++ version is 10x faster than the web-version:

WHISPER_NO_METAL=1 WHISPER_NO_ACCELERATE=1 make -j
./bench -m models/ggml-small.bin -t 8

whisper_print_timings:   encode time =  1255.23 ms /     1 runs ( 1255.23 ms per run)

The web-version is here: https://whisper.ggerganov.com/bench/

whisper_print_timings:   encode time = 13947.06 ms /     1 runs (13947.06 ms per run)
PABannier commented 2 months ago

@ggerganov Understood. So using Bark with WASM is probably not the right idea as the model is still very computationally intensive. I'll focus on supporting Metal and cuBLAS then. Thanks!