PABannier / bark.cpp

Suno AI's Bark model in C/C++ for fast text-to-speech
MIT License
630 stars 48 forks source link

some more issues #142

Closed Green-Sky closed 2 months ago

Green-Sky commented 2 months ago

The readme states:

# convert the model to ggml format
python3 convert.py --dir-model ./models --out-dir ./ggml_weights/

but the script also requires you to specify --vocab-path VOCAB_PATH, in which the vocab.txt can be found. Said vocab.txt got deleted, I found this pr and downloaded it from there.

Conversion seems to go well, but while it downloaded the encodec model too, it does not seems to convert that. Also, why is the resulting ggml_weights.bin only 4gig, when the source model is 12gig and presumably 32bit floats. (I ran it without --use-f16) There also seem to be no instructions on how to convert the encodec model for encodec.cpp (but there is a convert.py which I then used).

I ran bark main and it seems to work except for encodec:

encodec_load_model_weights: loading model from '../models_ggml/ggml-model.bin'
encodec_load_model_weights: in_channels = 1
encodec_load_model_weights: hidden_dim  = 128
encodec_load_model_weights: n_filters   = 32
encodec_load_model_weights: kernel_size = 7
encodec_load_model_weights: res_kernel  = 3
encodec_load_model_weights: n_bins      = 1024
encodec_load_model_weights: bandwidth   = 24
encodec_load_model_weights: sample_rate = 24000
encodec_load_model_weights: ftype       = 0
encodec_load_model_weights: qntvr       = 0
encodec_load_model_weights: ggml tensor size    = 320 bytes
encodec_load_model_weights: backend buffer size =  82.64 MB
encodec_load_model_weights: using CPU backend
encodec_load_model_weights: model size =    72.64 MB
encodec_load_model: n_q = 32
encodec_eval: compute buffer size: 92.87 MB

GGML_ASSERT: /home/green/workspace/bark.cpp/encodec.cpp/ggml/src/ggml.c:14354: false

:sweat_smile:

Green-Sky commented 2 months ago

trace of the assert:

#0  0x00007ffff79c3ddc in __pthread_kill_implementation () from /nix/store/1zy01hjzwvvia6h9dq5xar88v77fgh9x-glibc-2.38-44/lib/libc.so.6
#1  0x00007ffff79749c6 in raise () from /nix/store/1zy01hjzwvvia6h9dq5xar88v77fgh9x-glibc-2.38-44/lib/libc.so.6
#2  0x00007ffff795d8fa in abort () from /nix/store/1zy01hjzwvvia6h9dq5xar88v77fgh9x-glibc-2.38-44/lib/libc.so.6
#3  0x00007ffff7ebb3e3 in ggml_compute_forward_conv_1d_stage_0 () from /home/green/workspace/bark.cpp/build/encodec.cpp/ggml/src/libggml.so
#4  0x00007ffff7ed7224 in ggml_graph_compute_thread () from /home/green/workspace/bark.cpp/build/encodec.cpp/ggml/src/libggml.so
#5  0x00007ffff79c20e4 in start_thread () from /nix/store/1zy01hjzwvvia6h9dq5xar88v77fgh9x-glibc-2.38-44/lib/libc.so.6

also I noticed that threads are constantly recreated, not sure if its ggml or the usage of it, but thats going to be slow on windows.

Green-Sky commented 2 months ago

I used the f16 version of the encodec model and it works :tada:

Green-Sky commented 2 months ago

Using the f16 version of bark works as well :partying_face:

PABannier commented 2 months ago

Cool! Thanks for testing this :) I'll try to investigate the issue with the F32 weights.