Open JoeHogan opened 2 months ago
ok... so not a python guy, but this seems to be a bug in single_voice_from_VCTK_dataset
to fix this, i had to set the variable PREFERRED_MIC_output
this wasnt being set anywhere... not sure if you're supposed to pass it in. i set it to: PREFERRED_MIC_output="mic1_output"
and to fix the letters being cut off, i changed this:
LINE=${LINE:1:-1} # remove initial and final " in p225 data
to this
LINE=${LINE:0:-1} # remove initial and final " in p225 data
ok... so not a python guy, but this seems to be a bug in single_voice_from_VCTK_dataset
It's okay that you're not a python guy since this project is written in shell script, not python :)
to fix this, i had to set the variable PREFERRED_MIC_output this wasnt being set anywhere... not sure if you're supposed to pass it in. i set it to: PREFERRED_MIC_output="mic1_output"
There isn't supposed to be a variable called PREFERRED_MIC_output
in this project but I can see why you made that assumption. PREFERRED_MIC
is a string constant that is hardcoded to "mic1" since I found the differences between the two recordings included in the VCTK dataset to be minimal in testing.
It does appear that there are a couple of bugs in this line in single_voice_from_VCTK_dataset.sh
, which produces each line of metadata.csv
echo "${BASE}_$PREFERRED_MIC_output|0|$LINE"
The issue is that there should be curly braces around both PREFERRED_MIC
and LINE
, ie:
echo "${BASE}_${PREFERRED_MIC}_output|0|${LINE}"
As long as the file names (without extensions) in the first column of metadata.csv
match the names of the audio files in the folder there shouldn't be any Piper preprocessing errors.
and to fix the letters being cut off, i changed this:
LINE=${LINE:1:-1} # remove initial and final " in p225 data
to this
LINE=${LINE:0:-1} # remove initial and final " in p225 data
The purpose of this line is to remove quotes from the beginning and end of the transcript string. I would expect that it would truncate the first letter if the string it received did not include quotes. I will need to investigate this further to remind myself why there were quotes expected there in the first place.
EDIT: I've edited this comment as I better understand what you were saying...
The bug you mention makes sense, and fixing it should fix the PREFERRED_MIC bug. I think there was probably a change to the dataset which removed the quotes, so you no longer need that piece of code...
One other thing that got me was that tts_dojo/PRETRAINED_CHECKPOINTS/download_defaults.sh added a query string param to the downloads, '?download=true' which threw off the dojo when it was checking for .ckpt files... it wouldnt find them with the querystring on the end and i had to manually remove it from the file name:
I must be doing something wrong, but everything seems to go ok before running the dojo...
This is what the first few lines of my generated metadata.csv look like (note: im not sure if something went wrong here, but it looks like the first letter of each phrase is being cut off):
and these are some of the files in the wav_22050 folder:
This is what my generated DATASET/myvoice folder looks like: