MiniXC / alignments

Automatically creates/downloads alignments for multiple speech datasets, using pre-existing alignments were possible.
5 stars 0 forks source link

alignments

This tool is an abstraction of the Montreal Forced Aligner so it can be used as a PyTorch dataset.

libritts_100 = LibrittsDataset(
  target_directory="../data/libritts-train-clean-100-aligned",
  source_directory="../data/LibriTTS/train-clean-100",
  source_url="https://www.openslr.org/resources/60/train-clean-100.tar.gz",
  chunk_size=10_000,
)

The dataset can then be used as follows:

for item in libritts_100:
  item["wav"] # the audio
  item["speaker"] # speaker key
  item["transcript"] # normalized transcript
  item["phones"] # a list of triples (start_time_in_seconds, end_time_in_seconds, phone)

The "phones" list also inclodes [SILENCE] tokens between words, which are set to a length of 0 if no silence is present. In the case of punctuation, this silence token is replaced with the corresponding punctuation token.

Supported Datasets

Features

Planned Features

The following features are planned in future releases, please feel free to open issues if you have further ideas.