This webui is designed to train models for Beatrice v2 which is compatible with w-okada's realtime voice changing client v2: https://github.com/w-okada/voice-changer
The code to train beatrice models is adapted from: https://huggingface.co/fierce-cats/beatrice-trainer
The latest version of w-okada is 2.0.61-alpha as of writing this readme
As with a majority of my packages/repos, official support will be for Windows only. Linux shouldn't have much of an issue, just some pathing changes may be necessary. Pull request are accepted, though, I won't be able to actively maintain any Linux additions.
Will be available for Youtube Channel Members at the Supporter (Package) level: https://www.youtube.com/channel/UCwNdsF7ZXOlrTKhSoGJPnlQ/join
Install FFMPEG, overall, just a good tool to have and is needed for the repo.
Clone the repository
git clone https://github.com/JarodMica/beatrice_trainer_webui.git
Navigate into the repo
cd .\beatrice_trainer_webui\
Setup a virtual environment, specifying python 3.11
py -3.11 -m venv venv
Activate venv. If you've never run venv before on windows powershell, you will need to change ExecutionPolicy to RemoteSigned: https://learn.microsoft.com/en-us/answers/questions/506985/powershell-execution-setting-is-overridden-by-a-po
.\venv\Scripts\activate
Run the requirements.txt
pip install -r .\requirements.txt
Uninstall and reinstall torch manually. Other packages will install torch without cuda, to enable cuda, you need the prebuilt wheels.
torch 2.4.0 causes issues with CTranslate2 (causes issue with whisperx) so make sure you do this step
pip uninstall torch
pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu121
Initialized submodules
git submodule init
git submodule update --remote
Install submodules into venv
pip install .\modules\beatrice_trainer\
pip install .\modules\gradio_utils\
Grab the assets from the original beatrice HuggingFace repo at this hash here: https://huggingface.co/fierce-cats/beatrice-trainer/tree/be628e89d162d0d1aa038f57f19e1f578b7e6328
The easiest way is to clone the repo, checkout at that specific hash, then copy and paste assets
into the root folder of the beatrice trainer webui
git clone https://huggingface.co/fierce-cats/beatrice-trainer.git
cd beatrice-trainer
git checkout be628e89d162d0d1aa038f57f19e1f578b7e6328
cd ..
The folder structure should look like this:
beatrice_trainer_webui\assets
Run the webui
python webui.py
(Optional) Make a .bat file to automatically run the webui.py each time without having to activate venv each time. How to: https://www.windowscentral.com/how-create-and-run-batch-file-windows-10
call venv\Scripts\activate
python webui.py
There are 3 tabs: Create Dataset, Train, and Settings.
This tab is where you create your dataset. Follow the steps below to get a feel for doing this.
datasets
folder in the file explorer. Create a new folder in here and name it whatever you want the final model to be named. Open this now empty folder.elden_ring
and you have audio files for two speakers, melina
and ranni
elden_ring\ranni\<many audio files of ranni>
elden_ring\melina\<many audio files of ranni>
Dataset to Process
dropdown, select the freshly created dataset from steps 1-3 (if you don't see it, click Refresh Datasets Available
)
Begin Process
and it will start curating a dataset. The output will be placed in your training
folder
Dataset creation completed successfully
in the Progress Console
window.I haven't run into any issues at this step, so if you do, please open an issue in the github tab
The Create Dataset step should be completed before this proceeding here. If you don't see anything in the dropdown menu, click Refresh Training Datasets Available
and then choose the dataset to train on.
You could just click Start Training
and use the defaults, but I would adjust some of the settings based on what the webui says.
Dark Mode - Toggle on/off Dark Mode
Toggle Custom Theme - Toggle on/off custom theme
This would not be possible without w-okada and his contributors. Huge thanks to them for creating this powerful open-source tool: https://github.com/w-okada/voice-changer
Everything I've coded it MIT. Check w-okada for any licenses involving his tools (the voice changer client and beatrice)
Audio files used here are directly from Libritts-r: https://www.openslr.org/141/ which retains a license of CC BY 4.0.