festvox / festival

Festival Speech Synthesis System
Other
391 stars 58 forks source link

How to extract prosody features of wavs using festival for indic voices (Hindi)? #24

Closed skmalviya closed 5 years ago

skmalviya commented 5 years ago

How Intonational (F0 etc) and Duration (syllable accent) feature files would be generated for both training and testing wavs using a Trained "Hindi" Clustergen model.

Thanks in advance.

saikrishnarallabandi commented 5 years ago

The utterance data structure structure has this information. Utterance structure is stored in the directory 'festival/utts' as part of the build process. The information you desire can be dumped at multiple levels. For instance, to dump phone level duration information for a sample file A.utt, here is the command:

$FESTDIR/examples/dumpfeats -feats dur.feats -relation "Segment" festival/utts/A.utt -output dest_dir/desired_fname.dur

Here's a breakdown: $FESTDIR/examples/dumpfeats -> utility to dump features from utterance structure -feats -> argument to specify a file that has the desired features dur.feats -> File that has the desired features one per line. Example:

name
segment_start
segment_end

-relation -> The level at which to dump the features. In Festival, 'segment' refers to the phoneme level. festival/utts/A.utt -> File that we desire to dump info about -output dest_dir/desired_fname.dur -> destination location

To dump info about all files, the following command should help: $FESTDIR/examples/dumpfeats -feats dur.feats -relation "Segment" festival/utts/A.utt -output dest_dir/%s.dur

Explanation here too: http://festvox.org/bsv/x689.html

Hope this isnt too late :)

Lmk if any issues