bpotard / idlak

This repository is now obsolete. Please go to https://github.com/idlak/idlak instead.
https://github.com/idlak/idlak
Other
39 stars 15 forks source link

compute-aperiodic-feats failed #8

Closed ligz07 closed 7 years ago

ligz07 commented 7 years ago

when i was running the steps/make_bndap.sh.. i got a error as follow. my frame length is 100. ASSERTION_FAILED (compute-aperiodic-feats:RealCepsToMagnitudeSpec():feature-functions.cc:295) : 'std::abs(im) <= 1e-4 && "FFT of real cepstrum not expected to have imaginary value."'

bpotard commented 7 years ago

Hi,

Are you running one of the tts_dnn_arctic recipe? Or are you running steps/make_bndap.sh with your own data?

Can you also let me know on which platform you are running idlak?

ligz07 commented 7 years ago

hi, bpotard: Thanks for your response. i am running the tts_dnn_arctic recipe for Mandarin on linux(CentOS).
i use my own data and lexicon for chinese. it was failed in the make_bndap.sh step.

bpotard commented 7 years ago

Maybe your audio has a large level of noise? Or no noise at all in some regions? If silences have been replaced with 0 in the wave file I think it could create some issues. If you attach the sample that made the bndap extraction crash, I can have a look.

If there are big problems in the audio, then the mcep extraction will probably fail too.

Did you get any issue with the pitch extraction? You can look at the pitch extracted values using e.g.:

copy-feats scp:data/train_slt/pitch_feats.scp ark,t:- | less

In last resort, you can try to disable the assertion with the following patch:

--- a/src/feat/feature-functions.cc
+++ b/src/feat/feature-functions.cc
@@ -291,8 +291,8 @@ void RealCepsToMagnitudeSpec(VectorBase<Real> *real_cepstrum, bool apply_exp) {
     Real real = (*real_cepstrum)(i*2),
       im = (*real_cepstrum)(i*2 + 1);
     (*real_cepstrum)(i) = real;
-    KALDI_ASSERT(std::abs(im) <= 1e-4 &&
-                 "FFT of real cepstrum not expected to have imaginary value.");
+    //KALDI_ASSERT(std::abs(im) <= 1e-4 &&
+    //             "FFT of real cepstrum not expected to have imaginary value.");
   }
   (*real_cepstrum)(half_dim) = last_spectrum;
   //real_cepstrum->Scale(dim);
algoseer commented 7 years ago

Hi @ligz07.

I had the same issue and I was able to trace the error to the call

297 // Compute Real Cepstrum
298 PowerSpecToRealCeps(&noise_spectrum);

in feature-aperiodic.cc. Like @bpotard mentioned the error was caused by the variable noise_spectrum having zero values which gives a -inf after log inside the PowerSpecToRealCeps function. This eventually leads to nan values for im during the function call

389 RealCepsToMagnitudeSpec(&noise_spectrum, false /* get log spectrum*/);

which raises the KALDI_ASSERTION error.

In my case I was able to fix the issue by simply adding a small floor value to noise_spectrum

noise_spectrum.Add(1e-40);
PowerSpecToRealCeps(&noise_spectrum);
ligz07 commented 7 years ago

hi @bpotard Thanks for your analysis. i just uploaded the wav file that caused the bnap failure to the github. here is the link https://github.com/ligz07/idlak/tree/import-svn-idlak/issues/bnap

ligz07 commented 7 years ago

hi @algoseer Thanks for your help. unfortunately, this solution did not work for my cases:(. maybe the root cause of my cases is not same as yours......

bpotard commented 7 years ago

I did not manage to reproduce your issue with these two files. Are you using an up to date git repository?

I have a few other questions for troubleshooting:

Let's assume you have created a wav.scp file that contains: a /path/to/001047.wav b /path/to/001391.wav

You can try to run the following in a terminal:

compute-aperiodic-feats --config=conf/bndap.conf --frame-length=100 scp:wav.scp ark:'compute-kaldi-pitch-feats --frame-length=50 --config=conf/pitch.conf scp:wav.scp ark:- |' ark,t:-

Do you still get the same error?

ligz07 commented 7 years ago

hi @bpotard Are you sure the pitch extraction worked correctly? -- the pitch extraction has not any error. i think it is work correctly What frame length did you use for the pitch extraction? -- sorry...i made a mistake, actually the frame length is 45....the command line is as follow: **+ steps/make_bndap.sh --frame_length 45 data/dev_female exp/make_bndap/dev_female data/tmp/dnn_feats/arctic** Why did you use your own custom frame length of 100 for bndap extraction? For a female speaker, I suspect the automagic window should be around 40 instead; a window of 100 will probably smooth things too much. -- actually i have another database is for male....the 100 is used for male speaker

i added this code to ignore this failed, is this make sense?

in feature-aperiodic.cc file 226 for (int i = 0; i <= padded_windowsize/2; ++i) { 227 if (power_spectrum(i) < 0.0000000001) 228 { 229 power_spectrum(i) += 1e-4; 230 } 231 }

the reason is that i checked the harmonic_spectrum, some value are -INF.

bpotard commented 7 years ago

Hi @ligz07

Thresholding the power spectrum as you do is not a bad option, but probably you should do it that way instead for consistency:

226     for (int i = 0; i <= padded_window_size_/2; ++i) {
227       if (power_spectrum(i) < 1e-12) {
228         power_spectrum(i) += 1e-12;
229       }
230     }

I suspect you run into that problem because you have some extremely silent silences. You may run into some similar issues when you do the mcep extraction. If it is the case, you probably have to modify the compute_mcep_feats.sh script to use the -e or -E options of SPTK "mcep" tool, e.g. "-e 1e-12".

Hope that fixes the issues!

ligz07 commented 7 years ago

Hi @bpotard: Thank you very much:) i have trained my own TTS model successfully.