Extreme background noises on almost all generations and mixed speakers

Hello,

I really love to finally have found an UI for Suno Bark, which makes it really easier to generate some stuff on the fly, as my knowledge in python is so barebones, I am happy I get a line of text spoken. But but I have some major issues.

About 80% of all Text I generate has massive background noises or is just noise.
I have it happen multiple times that, no matter if I use plain input or SSML with only one single speaker defined, that the generation ends up switching between 2-5 voices.
That the chosen model often only respects the language of the premade suno voices but not the acutal chosen speaker. I often get the female voice eventhough I chose a male one.
Random length of the generation. It often generates 3-8 seconds of silence in the beginning and sometimes also 3-4 seconds in the middle of a line of text. It seems it tries to keep the soundfiles at 10-15 seconds length.

I am using a AMD Ryzen 7 5800X 8-Core Processor @ 3.80 GHz and a 3070ti

C0untFloyd / bark-gui

Extreme background noises on almost all generations and mixed speakers #99