Closed ligz07 closed 7 years ago
Hi,
Are you running one of the tts_dnn_arctic recipe? Or are you running steps/make_bndap.sh with your own data?
Can you also let me know on which platform you are running idlak?
hi, bpotard:
Thanks for your response. i am running the tts_dnn_arctic recipe for Mandarin on linux(CentOS).
i use my own data and lexicon for chinese.
it was failed in the make_bndap.sh step.
Maybe your audio has a large level of noise? Or no noise at all in some regions? If silences have been replaced with 0 in the wave file I think it could create some issues. If you attach the sample that made the bndap extraction crash, I can have a look.
If there are big problems in the audio, then the mcep extraction will probably fail too.
Did you get any issue with the pitch extraction? You can look at the pitch extracted values using e.g.:
copy-feats scp:data/train_slt/pitch_feats.scp ark,t:- | less
In last resort, you can try to disable the assertion with the following patch:
--- a/src/feat/feature-functions.cc
+++ b/src/feat/feature-functions.cc
@@ -291,8 +291,8 @@ void RealCepsToMagnitudeSpec(VectorBase<Real> *real_cepstrum, bool apply_exp) {
Real real = (*real_cepstrum)(i*2),
im = (*real_cepstrum)(i*2 + 1);
(*real_cepstrum)(i) = real;
- KALDI_ASSERT(std::abs(im) <= 1e-4 &&
- "FFT of real cepstrum not expected to have imaginary value.");
+ //KALDI_ASSERT(std::abs(im) <= 1e-4 &&
+ // "FFT of real cepstrum not expected to have imaginary value.");
}
(*real_cepstrum)(half_dim) = last_spectrum;
//real_cepstrum->Scale(dim);
Hi @ligz07.
I had the same issue and I was able to trace the error to the call
297 // Compute Real Cepstrum
298 PowerSpecToRealCeps(&noise_spectrum);
in feature-aperiodic.cc. Like @bpotard mentioned the error was caused by the variable noise_spectrum
having zero values which gives a -inf
after log inside the PowerSpecToRealCeps
function. This eventually leads to nan
values for im
during the function call
389 RealCepsToMagnitudeSpec(&noise_spectrum, false /* get log spectrum*/);
which raises the KALDI_ASSERTION error.
In my case I was able to fix the issue by simply adding a small floor value to noise_spectrum
noise_spectrum.Add(1e-40);
PowerSpecToRealCeps(&noise_spectrum);
hi @bpotard Thanks for your analysis. i just uploaded the wav file that caused the bnap failure to the github. here is the link https://github.com/ligz07/idlak/tree/import-svn-idlak/issues/bnap
hi @algoseer Thanks for your help. unfortunately, this solution did not work for my cases:(. maybe the root cause of my cases is not same as yours......
I did not manage to reproduce your issue with these two files. Are you using an up to date git repository?
I have a few other questions for troubleshooting:
Let's assume you have created a wav.scp file that contains: a /path/to/001047.wav b /path/to/001391.wav
You can try to run the following in a terminal:
compute-aperiodic-feats --config=conf/bndap.conf --frame-length=100 scp:wav.scp ark:'compute-kaldi-pitch-feats --frame-length=50 --config=conf/pitch.conf scp:wav.scp ark:- |' ark,t:-
Do you still get the same error?
hi @bpotard
Are you sure the pitch extraction worked correctly?
-- the pitch extraction has not any error. i think it is work correctly
What frame length did you use for the pitch extraction?
-- sorry...i made a mistake, actually the frame length is 45....the command line is as follow:
**+ steps/make_bndap.sh --frame_length 45 data/dev_female exp/make_bndap/dev_female data/tmp/dnn_feats/arctic**
Why did you use your own custom frame length of 100 for bndap extraction? For a female speaker, I suspect the automagic window should be around 40 instead; a window of 100 will probably smooth things too much.
-- actually i have another database is for male....the 100 is used for male speaker
i added this code to ignore this failed, is this make sense?
in feature-aperiodic.cc file 226 for (int i = 0; i <= padded_windowsize/2; ++i) { 227 if (power_spectrum(i) < 0.0000000001) 228 { 229 power_spectrum(i) += 1e-4; 230 } 231 }
the reason is that i checked the harmonic_spectrum, some value are -INF.
Hi @ligz07
Thresholding the power spectrum as you do is not a bad option, but probably you should do it that way instead for consistency:
226 for (int i = 0; i <= padded_window_size_/2; ++i) {
227 if (power_spectrum(i) < 1e-12) {
228 power_spectrum(i) += 1e-12;
229 }
230 }
I suspect you run into that problem because you have some extremely silent silences. You may run into some similar issues when you do the mcep extraction. If it is the case, you probably have to modify the compute_mcep_feats.sh
script to use the -e or -E options of SPTK "mcep" tool, e.g. "-e 1e-12".
Hope that fixes the issues!
Hi @bpotard: Thank you very much:) i have trained my own TTS model successfully.
when i was running the steps/make_bndap.sh.. i got a error as follow. my frame length is 100. ASSERTION_FAILED (compute-aperiodic-feats:RealCepsToMagnitudeSpec():feature-functions.cc:295) : 'std::abs(im) <= 1e-4 && "FFT of real cepstrum not expected to have imaginary value."'