build_path_to_transcript_dict_ljspeech doesn't match official ljspeech dataset

DigitalPhonetics / IMS-Toucan

Controllable and fast Text-to-Speech for over 7000 languages!

Apache License 2.0

1.4k stars 158 forks source link

When you download the official LJSpeech dataset from https://keithito.com/LJ-Speech-Dataset/, you do not get any text files.

def build_path_to_transcript_dict_ljspeech():
    path_to_transcript = dict()
    for transcript_file in os.listdir("/mount/resources/speech/corpora/LJSpeech/16kHz/txt"):
        with open("/mount/resources/speech/corpora/LJSpeech/16kHz/txt/" + transcript_file, 'r', encoding='utf8') as tf:
            transcript = tf.read()
        wav_path = "/mount/resources/speech/corpora/LJSpeech/16kHz/wav/" + transcript_file.rstrip(".txt") + ".wav"
        path_to_transcript[wav_path] = transcript
    return limit_to_n(path_to_transcript)

This dict builder expects a folder structure that does not exist in the original dataset as you can download it today.

DigitalPhonetics / IMS-Toucan

build_path_to_transcript_dict_ljspeech doesn't match official ljspeech dataset #136