IBM / watson-tts-python

TTS Python tools to assist customers in experimentation and configuration
Apache License 2.0
19 stars 5 forks source link

Bootstrap a corpus file from Watson Assistant training data #22

Open andrewrfreed opened 2 years ago

andrewrfreed commented 2 years ago

In https://github.com/IBM/watson-tts-python#extract_skill_textpy we extract WA data to a two-column file for listening to the prompts. The second column could be extracted to a separate file (sans header), and that new file is suitable as a language model corpus file for STT.

Either this tool or https://github.com/IBM/watson-stt-wer-python should be able to create (or append) to a corpus file from WA training data. I prefer append because we frequently want to duplicate utterances in the LM corpus file.

lmazzoli commented 1 year ago

May want to consider #30 when making this change.