FlorianEagox / WeeaBlind

A program to dub non-english media with modern AI speech synthesis, diarization, and voice cloning!
https://tessapainter.com/project/WeeaBlind
258 stars 25 forks source link

Dropdown box to select "Speaker voices" not getting assigned any value #26

Open MonX94 opened 1 month ago

MonX94 commented 1 month ago

Hello, and first of all, thanks for developing this tool! I've ran into an issue where I can't select any speaker voice beside the one that's there by default from a dropdown menu. I've attached a video demonstration. I'm not sure if the issue is reproducible but I hope that you at least know what could be the root of the issue. Other dropdown menus seem to run fine.

https://github.com/user-attachments/assets/f37c073c-4438-43a9-bd97-236abeb6165f

FlorianEagox commented 1 month ago

Ahh goodness, that is strange! I'll try to look into this this week. May I know if there were any errors in your terminal when this happened? Sometimes managing the state of the parameters is kinda finicky with WxPytthon

MonX94 commented 1 month ago

I do not get any errors when I try to change a menu element, but I get this non-related to wxPython error on the program launch:

2024-07-19 01:52:26.154501: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2024-07-19 01:52:26.154573: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

(I do have a GPU) As well as these, but these are probably not related, as I've seen those in your video overview as well:

(weeablind.py:2020): Gtk-CRITICAL **: 01:52:38.509: gtk_box_gadget_distribute: assertion 'size >= 0' failed in GtkScrollbar
(weeablind.py:2020): Gtk-CRITICAL **: 01:52:38.531: gtk_box_gadget_distribute: assertion 'size >= 0' failed in GtkScrollbar

I'm running this on Linux Mint 21, CUDA 12.2.

MonX94 commented 1 month ago

Is there maybe a way to work around it? Or maybe I can provide additional, more specific debug information?

MonX94 commented 1 month ago

image After some fiddling I found that it is possible to run single-speaker models, like tacotron2, which is probably good enough, but I found that xTTSv2 (on the screenshot) and vctk/vits both need voices to be selected and I can't change them. Shouldn't xTTSv2 not even require a voice selection? What's the logic for this in the code? I tried looking it up in the code but didn't get it; sadly I have no prior wxPython experience.Sample

MonX94 commented 1 month ago

I've figured it out: in Voice.py, in the function set_voice_params, there was no check whether speaker is not "None". the function is ran multiple times after picking an item in wx.Choice, hence the speaker is set to None. I'll make a pull request soon.