PABannier / bark.cpp

Suno AI's Bark model in C/C++ for fast text-to-speech
MIT License
633 stars 48 forks source link

How to accurately set up prompt? #116

Closed qbc2016 closed 2 months ago

qbc2016 commented 9 months ago

Hi! I've tried different prompts, but the results are very strange. See the following examples:

  1. Precision: fp32. Prompt: "one two three four five six seven eight nine ten." The output is 9 seconds long, but it only takes the first 3s to read out "eight nine ten", and the other 6s almost contain nothing.
  2. Precision: q4. Prompt: "one two three four five six seven eight nine ten." The output is a 12-second-long murmur
  3. Precision: q4. Prompt: "one two three four five six." The output only reads out "two three four five six". There are also some issues that occur when using different random seeds or prompts like "[MAN] one two three four five six" and "[happy piano music, playing for ten seconds]". Are there any solutions or suggestions for setting the prompts accurately (especially for playing music)? Thx!
PABannier commented 9 months ago

Hi @qbc2016 ! Thanks for flagging this out.

bark.cpp is not stable yet in that most prompts still yield non sense. It turns out that there was a bug in the implementation of the 1d convolution that we are solving. Hopefully after this fix, we should have more stable input.

As for the music, we're looking for people to contribute to the repo and accelerate the support of models like Audiocraft.

qbc2016 commented 9 months ago

Thanks for your reply!

PABannier commented 2 months ago

@qbc2016 The issue should be fixed with #139 .