BUTSpeechFIT / huggingface_asr

Extensions of huggingface library for e2e speech recognition.
1 stars 0 forks source link

Refine VAD segmentation in short silences #20

Open ISzoke opened 9 months ago

ISzoke commented 9 months ago

Now, the dataset splitter splits data according to VAD settings which can produce long segments (>30s for example). The postprocessing splits these to 30s sharp, which ends up in split in speech.

We need update to split in some small silence close to the 30s.

It can be done on the level of data builder (GPU accelerated) or on the level of trainer transformation.