C0untFloyd / bark-gui

🔊 Text-Prompted Generative Audio Model with Gradio
MIT License
663 stars 60 forks source link

Extreme background noises on almost all generations and mixed speakers #99

Open AkumaNoTsubasa opened 6 months ago

AkumaNoTsubasa commented 6 months ago

Hello,

I really love to finally have found an UI for Suno Bark, which makes it really easier to generate some stuff on the fly, as my knowledge in python is so barebones, I am happy I get a line of text spoken. But but I have some major issues.

  1. About 80% of all Text I generate has massive background noises or is just noise.
  2. I have it happen multiple times that, no matter if I use plain input or SSML with only one single speaker defined, that the generation ends up switching between 2-5 voices.
  3. That the chosen model often only respects the language of the premade suno voices but not the acutal chosen speaker. I often get the female voice eventhough I chose a male one.
  4. Random length of the generation. It often generates 3-8 seconds of silence in the beginning and sometimes also 3-4 seconds in the middle of a line of text. It seems it tries to keep the soundfiles at 10-15 seconds length.

I am using a AMD Ryzen 7 5800X 8-Core Processor @ 3.80 GHz and a 3070ti