Tomiinek / Blizzard2013_Segmentation

Transcripts and segmentation for the Blizzard 2013 audiobooks also known as the Lessac or Blizzard 2013 dataset.
43 stars 9 forks source link

The splitting of the data #3

Open jinhonglu opened 3 years ago

jinhonglu commented 3 years ago

Hi, I just wonder whether the splitting you implemented is from one of the paper in the reference?

Tomiinek commented 3 years ago

Hello, no it is not.

jinhonglu commented 3 years ago

Another question is that, is there any dev and test set, which I cat get access? Or I have to split the dev from the train myself?

Tomiinek commented 3 years ago

No, you have to split it on your own.

jinhonglu commented 3 years ago

I tried to use the alignment tool, and I realised each audio length is about 15-25s. Is it possible to modify the script to align shorter length(<10s)?