Open HandsomeDevilv112 opened 6 years ago
I faced the same problem. Apparently, HVite is not able to align properly with all wav files, so the output mlf will be incomplete. What I did to solve it was to change the pruning coefficients inf forced_+alignent.py
PRUNING = [str(i) for i in (250., 150., 5000.)]
Thanks! I'll give that a spin as soon as I am able. So, what did you do later on for train/valid/test?
Have a look at your wav files, especially those that were culled from the data set. It's possible they don't have sufficient silence at the start and/or end. HTK may be assigning a silence "phoneme" label to a segment with no duration (zero length). If that is the case, try adding some silence to those files (or all of your files, for a quick test).
I could see that being an issue. How much silence should I be looking to add? A second or so?
I would say 0.02 second should be plenty. I have had success with this procedure in the ~five times I've seen it happen. I'm no Merlin expert, however :-)
I'll tell ya, I appreciate it. I got big dreams, but I sometimes struggle with my secret illiteracy.
after performing the first mentioned solution, I did not appear to gain any files. So I've placed a .5 second silence before each wav file and am currently re-running prepare_labels. First thing to note, while running this batch, the number of lines with "WARNING [-2637] HeaviestMix:" is down from 97 to 86 and 5 errors that read "s1/labels/mono_phone.list" at the end. I'll let you know how it turns out in a couple of days. Edit: I did some looking around about the herest warning, and found http://www.pamanyungan.net/2017/12/htk-error-list/ the relevant portion being: "Trying to increase the number of Gaussian mixtures for each hmm at the end of training, incrementing by 2 each time. From htkbook: "Defunct mixture components can be prevented by setting the -w option in HERest so that all mixture weights are floored to some level above MINMIX."
Looking up herest -w, I find http://www1.icsi.berkeley.edu/Speech/docs/HTKBook/node194_mn.html -w Any mixture weight which falls below the global constant MINMIX is treated as being zero. When this parameter is set, all mixture weights are floored to f * MINMIX.
poking around in the merlin code base, I don't see a minmix immediately. Has anyone else tried this? If so, did you get better results from it?
HandsomeDevilv112 wrote:
I've placed a .5 second silence before each wav file A tip when doing such silence padding: use a section of recorded silence taken from one of your audio files, and not pure digital silence (i.e., a sequence of zero-valued waveform samples) because the latter can cause numerical issues in MFCC extraction.
Recorded silence will also be a good acoustic match to the within-utterance silences, remembering that the models for utterance-initial/final and within-utterance silences have shared parameters.
Simon
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
Thank you! I can swap that out without too much trouble (I think)
Okay, so, I've done the above, and unfortunately the result did not change. I'd like to try one more thing before I modify my dataset. How does HERest -w work? I might be looking in all the wrong places, but I haven't come across information that gives a working example. Things I have tried thus far have produced syntax errors, and not much more than.
DrSangramsings-MacBook-Air:own_voice_generation sangramsing$ ./02_prepare_labels.sh database/wav database/txt.done.data database/labels Step 2: Preparing labels... Please configure paths to speech_tools, festival and festvox in config.cfg !! Copying labels to duration and acoustic data directories... sed: 1: "experiments/sing_iiit/d ...": invalid command code e sed: 1: "experiments/sing_iiit/a ...": invalid command code e done...!
To start, I have 154 txt files, and I have 154 wav files. I ran bash "02_prepare_labels.sh wav txt labels" while attempting to create my own voice, and when that action is complete, label_state_align has only 16 .lab files.
This is different than the rest of the folder, including: label_no_align, mfc, prompt-utt, and mono_no_align which all contain 154 items each.
it should also be noted that I'm using the updated forced_alignment.py, and while processing, I received 97 warnings that read very similar to: "WARNING [-2637] HeaviestMix: mix 15 in sil has v.small gConst [-330000105472.000000] in /root/merlin/tools/bin/htk/HHEd" and 4 errors reading something very similar to /root/merlin/tools/bin/htk/HHEd -A -H /root/merlin/egs/build_your_own_voice/s1/labels/model/hmm_mix_16_iter_7/macros -H /root/merlin/egs/build_your_own_voice/s1/labels/model/hmm_mix_16_iter_7/hmmdefs -M /root/merlin/egs/build_your_own_voice/s1/labels/model/hmm_mix_32_iter_0 /root/merlin/egs/build_your_own_voice/s1/labels/config/mix_32.hed /root/merlin/egs/build_your_own_voice/s1/labels/mono_phone.list
and, I'm not quite sure what to do with it. Any and all help will be appreciated.