CSTR-Edinburgh / merlin

This is now the official location of the Merlin project.
http://www.cstr.ed.ac.uk/projects/merlin/
Apache License 2.0
1.31k stars 441 forks source link

inconsistency between number of label_state_align files against all other labels #317

Open HandsomeDevilv112 opened 6 years ago

HandsomeDevilv112 commented 6 years ago

To start, I have 154 txt files, and I have 154 wav files. I ran bash "02_prepare_labels.sh wav txt labels" while attempting to create my own voice, and when that action is complete, label_state_align has only 16 .lab files.

This is different than the rest of the folder, including: label_no_align, mfc, prompt-utt, and mono_no_align which all contain 154 items each.

it should also be noted that I'm using the updated forced_alignment.py, and while processing, I received 97 warnings that read very similar to: "WARNING [-2637] HeaviestMix: mix 15 in sil has v.small gConst [-330000105472.000000] in /root/merlin/tools/bin/htk/HHEd" and 4 errors reading something very similar to /root/merlin/tools/bin/htk/HHEd -A -H /root/merlin/egs/build_your_own_voice/s1/labels/model/hmm_mix_16_iter_7/macros -H /root/merlin/egs/build_your_own_voice/s1/labels/model/hmm_mix_16_iter_7/hmmdefs -M /root/merlin/egs/build_your_own_voice/s1/labels/model/hmm_mix_32_iter_0 /root/merlin/egs/build_your_own_voice/s1/labels/config/mix_32.hed /root/merlin/egs/build_your_own_voice/s1/labels/mono_phone.list

and, I'm not quite sure what to do with it. Any and all help will be appreciated.

fosimoes commented 6 years ago

I faced the same problem. Apparently, HVite is not able to align properly with all wav files, so the output mlf will be incomplete. What I did to solve it was to change the pruning coefficients inf forced_+alignent.py

PRUNING = [str(i) for i in (250., 150., 5000.)]

HandsomeDevilv112 commented 6 years ago

Thanks! I'll give that a spin as soon as I am able. So, what did you do later on for train/valid/test?

fuzeller commented 6 years ago

Have a look at your wav files, especially those that were culled from the data set. It's possible they don't have sufficient silence at the start and/or end. HTK may be assigning a silence "phoneme" label to a segment with no duration (zero length). If that is the case, try adding some silence to those files (or all of your files, for a quick test).

HandsomeDevilv112 commented 6 years ago

I could see that being an issue. How much silence should I be looking to add? A second or so?

fuzeller commented 6 years ago

I would say 0.02 second should be plenty. I have had success with this procedure in the ~five times I've seen it happen. I'm no Merlin expert, however :-)

HandsomeDevilv112 commented 6 years ago

I'll tell ya, I appreciate it. I got big dreams, but I sometimes struggle with my secret illiteracy.

HandsomeDevilv112 commented 6 years ago

after performing the first mentioned solution, I did not appear to gain any files. So I've placed a .5 second silence before each wav file and am currently re-running prepare_labels. First thing to note, while running this batch, the number of lines with "WARNING [-2637] HeaviestMix:" is down from 97 to 86 and 5 errors that read "s1/labels/mono_phone.list" at the end. I'll let you know how it turns out in a couple of days. Edit: I did some looking around about the herest warning, and found http://www.pamanyungan.net/2017/12/htk-error-list/ the relevant portion being: "Trying to increase the number of Gaussian mixtures for each hmm at the end of training, incrementing by 2 each time. From htkbook: "Defunct mixture components can be prevented by setting the -w option in HERest so that all mixture weights are floored to some level above MINMIX."

Looking up herest -w, I find http://www1.icsi.berkeley.edu/Speech/docs/HTKBook/node194_mn.html -w Any mixture weight which falls below the global constant MINMIX is treated as being zero. When this parameter is set, all mixture weights are floored to f * MINMIX.

poking around in the merlin code base, I don't see a minmix immediately. Has anyone else tried this? If so, did you get better results from it?

simonkingedinburgh commented 6 years ago

HandsomeDevilv112 wrote:

I've placed a .5 second silence before each wav file A tip when doing such silence padding: use a section of recorded silence taken from one of your audio files, and not pure digital silence (i.e., a sequence of zero-valued waveform samples) because the latter can cause numerical issues in MFCC extraction.

Recorded silence will also be a good acoustic match to the within-utterance silences, remembering that the models for utterance-initial/final and within-utterance silences have shared parameters.

Simon

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

HandsomeDevilv112 commented 6 years ago

Thank you! I can swap that out without too much trouble (I think)

HandsomeDevilv112 commented 6 years ago

Okay, so, I've done the above, and unfortunately the result did not change. I'd like to try one more thing before I modify my dataset. How does HERest -w work? I might be looking in all the wrong places, but I haven't come across information that gives a working example. Things I have tried thus far have produced syntax errors, and not much more than.

Sangramsingkayte commented 6 years ago

DrSangramsings-MacBook-Air:own_voice_generation sangramsing$ ./02_prepare_labels.sh database/wav database/txt.done.data database/labels Step 2: Preparing labels... Please configure paths to speech_tools, festival and festvox in config.cfg !! Copying labels to duration and acoustic data directories... sed: 1: "experiments/sing_iiit/d ...": invalid command code e sed: 1: "experiments/sing_iiit/a ...": invalid command code e done...!