Closed prakharpbuf closed 1 year ago
Went through some github issues and concluded that I don't need that SV2TTS/ directory. It will be created fresh when I run synthesizer_preprocess_audio.py
I have downloaded LibriSpeech train-clean-100 dataset and I'll practice finetuning the model with one of the speakers in the dataset before I create my own dataset and finetune it. The train-clean-100 dataset contains .flac audio files with all transcript in < speaker >-< book >.trans.txt as opposed to what this program requires. i.e. utterance-xx.flac and utterance-xx.txt
Does anyone have any code written to get the transcripts in the required format?
I found a script for preprocessing these transcripts for the Mozilla datasets but nothing for LibriSpeech. I will wait till tomorrow, otherwise I will write my own, share it here for anyone who needs it, and close this issue.
import os
for root, dirs, files in os.walk(r'C:\LibriSpeech\train-clean-100'):
if len(files) == 0:
continue
try:
head, book = os.path.split(root)
head, speaker = os.path.split(head)
transFilePath = os.path.join(root, f"{speaker}-{book}.trans.txt")
transFile = open(transFilePath)
transText = transFile.readlines()
for line in transText:
utterance = line.split(" ")[0]
utteranceFilePath = os.path.join(root, f"{utterance}.txt")
if(os.path.exists(utteranceFilePath)):
os.remove(utteranceFilePath)
utteranceFile = open(utteranceFilePath, 'w')
utteranceFile.write(" ".join(line.split(" ")[1:]))
utteranceFile.close()
transFile.close()
# os.remove(transFilePath)
except Exception as e:
print(e)
continue
Hey, I'm trying to finetune the pretrained model but it looks like I am missing the SV2TTS/ directory which contains train.txt, etc. I have a saved_models/ directory which has three *.pt files for the three components of this TTS architecture.