DanRuta / xva-trainer

UI app for training TTS/VC machine learning models for xVASynth, with several audio pre-processing tools, and dataset creation/management.
92 stars 17 forks source link

Steam release 1.2.0 seems to be broken #5

Closed stohrendorf closed 1 year ago

stohrendorf commented 1 year ago

Multiple issues. Checking data files through Steam didn't show any error. Cleaning up the dataset didn't help.

  1. I've got this stacktrace after starting a new training from scratch:
    Traceback (most recent call last):
    File "server.py", line 227, in handleTrainingLoop
    File "python\xvapitch\xva_train.py", line 137, in handleTrainer
    File "python\xvapitch\xva_train.py", line 554, in start
    File "python\xvapitch\xva_train.py", line 601, in iteration
    File "python\xvapitch\xva_train.py", line 377, in init
    File "python\xvapitch\xva_train.py", line 1206, in setup_dataloaders
    File "C:\Program Files (x86)\Steam\steamapps\common\xVATrainer\.\resources\app\python\xvapitch\util.py", line 410, in get_language_weighted_sampler
    return WeightedRandomSampler(dataset_samples_weight, len(dataset_samples_weight))
    File "torch\utils\data\sampler.py", line 186, in __init__
    raise ValueError("num_samples should be a positive integer "
    ValueError: num_samples should be a positive integer value, but got num_samples=0

The line numbers there don't match up with xva_train.py, changes to that file to debug this are completely ignored, whereas changes to e.g. dataset.py are working fine. Throwing an exception in read_datasets shows that at least one point it's returning the correct dataset.

  1. The UI is still broken when adding new trainings, it seems the list of trainings must be cleared in order to be able to add a new training.
DanRuta commented 1 year ago

Hey. For the first issue, please check that you have downloaded the "priors" data, from nexusmods. They are currently not shipped with the Steam build

stohrendorf commented 1 year ago

Thanks for the clarification, this gave me a few headaches. Consider the first issue to be a wish for a better error message now ;)

stohrendorf commented 1 year ago

Update: sorry, but downloading the data files and extracting them didn't solve the issue. I have downloaded both data files and extracted them using 7zip, but it's showing up the same error with the same stacktrace. This is the directory layout after extracting:

Update: Windows and 7zip struggled so much that (for some reason) it showed files that were not actually there.

DanRuta commented 1 year ago

Is it working ok now?

If not, the other issue might be finetuning dataset formatting, if the audio files can't be found by the app. To clarify, there should be wav files inside the "wavs" folder, and next to the "wavs" folder there is a metadata.csv file with | formatting. It should look the same as any of these priors datasets.

So in the app, in the training config, the dataset path for "ar_priors_x", if hypothetically that was your custom dataset, should be ...../resources/app/xvapitch/PRIORS/ar_priors_x

stohrendorf commented 1 year ago

The dataset itself isn't the problem, it's about adding training configurations. I have added copies of the dataset with cleaned up WAVs from different folders already, but when training tasks are in the list, and I try to add another one, it just doesn't appear in the list, no matter whether I re-use a dataset from another training task, or if I select an unused dataset.

On a side note, the training stopped after a few hours with an OOM (system RAM, not VRAM), which was a bit surprising given that I have 64GB.

DanRuta commented 1 year ago

There is currently a fair bit of data caching in the dataloader. I've removed it for the next update, but meanwhile you can reduce the number of workers in the training config, which should use up less RAM.

For the training queue, would you please be able to share the app.log file located next to the xVATrainer.exe? It might have an error stack to indicate what's wrong.

stohrendorf commented 1 year ago

Hm. I can't reproduce it anymore now, yet I'm certain it happened multiple times. The log file is fairly uninteresting, except for this single line, but that's probably just an invalid training configuration: [line: 702] onerror: Uncaught TypeError: trainingAddConfigCkptPathInput.replaceAll is not a function.

About the data caching, I have moved every reference data set except for the speaker's language to a different folder, which still eats ~30 GB of RAM, but it doesn't go OOM anymore - are there consequences for excluding most "priors" datasets?

DanRuta commented 1 year ago

Just fixed that error. As for not including priors, every priors folder contains some synthetic data for a different language. This data is used during training to ensure that the models you fine-tune on mono-language, and mono-speaker-style will not lose any knowledge of the other languages, nor vocal range (useful for voice conversion, pitch/emotion/style manipulation). I recommend not messing with the priors datasets, unless you choose to add MORE data to them (eg your own, higher quality non-synthetic data).

I'm pushing an update through today, which makes the training consume less system RAM.

stohrendorf commented 1 year ago

Oh, okay, that may explain why the voice synthesis I did is so emotionless. Thanks for that explanation. I'm wondering whether re-training from scratch or just adding the additional priors later on makes a huge difference, though.

As a side note, I created a new base dataset with basically complete sentences of reference voice, and the UI hint about VRAM and batch size ratio doesn't align with that, i.e. it went OOM with a batch size of 12 (I have 12 GB of VRAM) - using the voice samples I have, and using a batch size of 8, it uses about 10 GB of VRAM, and around 30 GB of system RAM (although that's because I removed every other language from the priors). I'm just guessing here, but it seems that at certain points, it needs an additional 1-2 GB just to save the checkpoints, which in turn leads to an OOM if the batch size is too large. In other words, it seems like the UI hint about the batch size seems to be a bit misleading.