domesticatedviking / TextyMcSpeechy

Easily create text-to-speech models in any voice for rhasspy/piper. Make a text-to-speech model with your own voice recordings, or use thousands of RVC voices. Works offline on a Raspberry pi. Rapidly record custom datasets for any metadata.csv file and listen to your model as it is training.
MIT License
246 stars 8 forks source link

run_training.sh not working: "No utterances found" #10

Open JoeHogan opened 2 months ago

JoeHogan commented 2 months ago

I must be doing something wrong, but everything seems to go ok before running the dojo...

CHECKPOINTS/default/M_voice/medium/epoch=2164-step=1355540.ckpt

Creating symbolic links in your dojo.
Symbolic links created successfully.

Dataset linked successfully.  Press <Enter> to begin preprocessing.

      running scripts/preprocess_dataset.sh

       Auto-configured sampling rate: 22050
    Calculated value for max-workers: 4

Configuring piper for language: en-us
Running piper_train.preprocess

WARNING:preprocess:Missing p316_001_
WARNING:preprocess:Missing p316_002_
WARNING:preprocess:Missing p316_003_
WARNING:preprocess:Missing p316_004_

...

WARNING:preprocess:Missing p316_423_
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/joe/Documents/repos/TextyMcSpeechy/piper/src/python/piper_train/preprocess.py", line 502, in <module>
    main()
  File "/home/joe/Documents/repos/TextyMcSpeechy/piper/src/python/piper_train/preprocess.py", line 148, in main
    assert num_utterances > 0, "No utterances found"
AssertionError: No utterances found
piper_train.preprocess failed.  Press <enter> to exit.

This is what the first few lines of my generated metadata.csv look like (note: im not sure if something went wrong here, but it looks like the first letter of each phrase is being cut off):

p316_001_|0|lease call Stella
p316_002_|0|sk her to bring these things with her from the store
p316_003_|0|ix spoons of fresh snow peas, five thick slabs of blue cheese, and maybe a snack for her brother Bob
p316_004_|0|e also need a small plastic snake and a big toy frog for the kids

and these are some of the files in the wav_22050 folder:

image

This is what my generated DATASET/myvoice folder looks like:

image

JoeHogan commented 2 months ago

ok... so not a python guy, but this seems to be a bug in single_voice_from_VCTK_dataset

to fix this, i had to set the variable PREFERRED_MIC_output

this wasnt being set anywhere... not sure if you're supposed to pass it in. i set it to: PREFERRED_MIC_output="mic1_output"

and to fix the letters being cut off, i changed this:

    LINE=${LINE:1:-1}  # remove initial and final " in p225 data

to this

    LINE=${LINE:0:-1}  # remove initial and final " in p225 data
domesticatedviking commented 2 months ago

ok... so not a python guy, but this seems to be a bug in single_voice_from_VCTK_dataset

It's okay that you're not a python guy since this project is written in shell script, not python :)

to fix this, i had to set the variable PREFERRED_MIC_output this wasnt being set anywhere... not sure if you're supposed to pass it in. i set it to: PREFERRED_MIC_output="mic1_output"

There isn't supposed to be a variable called PREFERRED_MIC_output in this project but I can see why you made that assumption. PREFERRED_MIC is a string constant that is hardcoded to "mic1" since I found the differences between the two recordings included in the VCTK dataset to be minimal in testing.

It does appear that there are a couple of bugs in this line in single_voice_from_VCTK_dataset.sh , which produces each line of metadata.csv

echo "${BASE}_$PREFERRED_MIC_output|0|$LINE"

The issue is that there should be curly braces around both PREFERRED_MIC and LINE, ie:

echo "${BASE}_${PREFERRED_MIC}_output|0|${LINE}"

As long as the file names (without extensions) in the first column of metadata.csv match the names of the audio files in the folder there shouldn't be any Piper preprocessing errors.

and to fix the letters being cut off, i changed this:

    LINE=${LINE:1:-1}  # remove initial and final " in p225 data

to this

    LINE=${LINE:0:-1}  # remove initial and final " in p225 data

The purpose of this line is to remove quotes from the beginning and end of the transcript string. I would expect that it would truncate the first letter if the string it received did not include quotes. I will need to investigate this further to remind myself why there were quotes expected there in the first place.

JoeHogan commented 2 months ago

EDIT: I've edited this comment as I better understand what you were saying...

The bug you mention makes sense, and fixing it should fix the PREFERRED_MIC bug. I think there was probably a change to the dataset which removed the quotes, so you no longer need that piece of code...

One other thing that got me was that tts_dojo/PRETRAINED_CHECKPOINTS/download_defaults.sh added a query string param to the downloads, '?download=true' which threw off the dojo when it was checking for .ckpt files... it wouldnt find them with the querystring on the end and i had to manually remove it from the file name:

image