Several errors raised when running ./misc/scripts/vocoder/magphase/extract_features_for_merlin.py

felipeespic commented 6 years ago

By @dreamk73: (about magphase_integration branch)

@felipeespic when running the script to extract magphase acoustic features with my 48000 Hz audio data, I get an error. Same with 16000. on line 392 of tools/magphase/src/magphase.py it says: if (fs != 48000) or (fs != 16000) The 'or' should be 'and' And then on line 1657 I get another error but now fs is a list of numbers instead of the single number it was before, so somewhere fs is assigned something else.

tools/magphase/src/magphase.py line 155 m_sp, m_ph, v_shift, m_frms, m_fft = analysis_with_del_comp_from_pm(v_in_sig, v_pm_smpls, nFFT, win_func=win_func, nwin_per_pitch_period=nwin_per_pitch_period) should be m_sp, m_ph, v_shift, m_frms, m_fft = analysis_with_del_comp_from_pm(v_in_sig, fs, v_pm_smpls, nFFT, win_func=win_func, nwin_per_pitch_period=nwin_per_pitch_period) Also, that function only returns two variables and this line has five?

Change the return statement of analysis_with_del_comp_from_pm to have those five variables. Next issue is that the function get_fft_params_from_complex_data is not defined in magphase.py. I did have it in the previous version that you made but c&p it and running analysis gave lots of the same errors: fft : m must be a integer of power of 2!

felipeespic commented 6 years ago

Hi, those functions do not work, because they are deprecated. My mistake is that I did not update the _extract_features_formerlin.py script included in the branch. In the mean time, you can use the _0_batch_feature_extraction_formerlin.py script provided in MagPhase to extract features for Merlin. I will fix this as soon as I have time.

Thanks!

dreamk73 commented 6 years ago

Sorry for the wrong thread. I used the 0_batch.. script you mentioned instead to extract the features.

Then I had to add an acoustic_feats directory to my acoustic_model/data directory to point to where I put the newly created features, so I could create the variable rate label files.

felipeespic commented 6 years ago

No problem. Did it work?

dreamk73 commented 6 years ago

Yes, it works now. If I turn off the postfilter, the speech sounds intelligible. I still think the quality is not as good as it could be, given how good the copy synthesis sounds and the Merlin voice with the WORLD vocoder. I am not sure what can be done to improve it though.

ex.zip

felipeespic commented 6 years ago

Could you send to me?:

10 (20 is better) wav files with natural speech.
10 wav files and their acoustic features synthesised with MagPhase without post-filter.
10 wav files and their acoustic features synthesised with World without post-filter.

I think I will have time today to look at this in more detail, so I can inspect the data and come out with a solution.

dreamk73 commented 6 years ago

Sure, I can do that. Do you want the acoustic features extracted from the audio or generated with merlin or both?

m-toman commented 6 years ago

After I did the pull request for the SLT full demo I also tried it with some other data. Except that the label converter sorted out about 1000 of 3000 wavs, I ended up with the following results: magphase_merlin.zip (with and without postfilter).

Do you have any idea what could cause this? It's from a relatively deep male voice.

felipeespic commented 6 years ago

Hi @m-toman ,

After I did the pull request for the SLT full demo I also tried it with some other data. Except that the label converter sorted out about 1000 of 3000 wavs

It shouldn't be the rejection rate that huge.

Unfortunately, the label converter is a work around for Merlin's lack of ability for working at variable frame rate. Actually, the converter could convert all the labels if we wanted, but then Merlin would crash.

The real solution for this is to implement variable-frame-rate support natively in Merlin.

Do you have any idea what could cause this?

I think there is a missmatch in sample rate. Check the value of the fs variable in MagPhase, especially during synthesis.

m-toman commented 6 years ago

For this test I downsampled 44.1kHz wavs to 16kHz so I can use the SLT demo without modification. That's why I assumed there shouldn't be any issues with the sampling rate. But will check all settings again.

EDIT: I checked the copy-synthesis and the downsampled wav resulted in a more or less unvoiced sample. I tried the upsampled to 48kHz version and there was also a problem at the beginning. I then removed the min/max setting for REAPER in libaudio and now it worked fine. The samples are attached in case you are interested. wavs_syn.zip I'll go on with the 48kHz version now and see what happens

m-toman commented 6 years ago

Hi again. Meanwhile with the latest version and recordings in 48kHz I also got other data to work with merlin. The label converter sorted out about 100 from the 3.5k samples, so not too bad. But had to run test_nan.sh which in turn sorted out another 1k sentences. The results are pretty similar to the results of @dreamk73, so not bad but certainly worse than the same voice with WORLD. Which was surprising because the SLT "full demo" ended up pretty good. Unfortunately don't think I'll find much time to investigate more in the next week.

felipeespic commented 6 years ago

Hi @m-toman , I don't understand very well:

Did it work well with the 48kHz data?
The voice that works worse than WORLD was built using another data? Which sample rate is it?
So, at the end, how many utterances were rejected from training (from the total of 3.5k)?
I don't know what the script test_nan.sh does. Why does it reject utterances?

Sorry for making too many questions, Thanks!

m-toman commented 6 years ago

@felipeespic I can see if I can get you some samples, recordings, features etc. in case you are interested.

For my last experiment I used 48kHz recordings (previously I had another dataset that was upsampled from 44.1, haven't tried that again) and in the end it worked out.
It's a female voice with 3.5k sentences, also trained on 48kHz. result was worse than WORLD (with REAPER for F0)
They moved the test_nan.sh here: https://github.com/CSTR-Edinburgh/merlin/blob/master/egs/build_your_own_voice/s1/scripts/test_nan.sh Basically just checks the files for NaNs and I use it to sort them out.
About 1k sentences were sorted out by test_nan. About 100 by the label converter.

Thanks

felipeespic commented 6 years ago

Hi @m-toman , I think that as Merlin+MagPhase used much less data than Merlin+WORLD, obviously the result was worse. So, I think that I need to check why there are so many rejected utterances and other possible causes of degradation.

Could you send me some data (original audio files + label files), so I can build basic voices with MagPhase and WORLD? and compare and see what's happening.

Specifically, could you send me 50 samples (wavs + labs) of utterences rejected by test_nan.sh, plus 50 samples (wavs + labs) of utterances that passed the filters, please?

I need to figure out why MagPhase's performance is suboptimal with some voices.

Thanks

felipeespic commented 6 years ago

@m-toman , I just committed a fixing for the nans problem (https://github.com/CSTR-Edinburgh/magphase/commit/5a3b32b4698fc608f44f112f84bc2f8c21713cb9). Hopefully, that will decrease the amount of rejected utterances for you (now it is supposed to be zero). Let me know if that works, please.

Thanks

m-toman commented 6 years ago

Hi, thanks. I'm not at my machine on which I ran it for a week, but I'll see to run a new experiment with the latest version.

dreamk73 commented 6 years ago

We will try it with the new script. I am also going to try to train a version with 16kHz wav data to see if that makes any difference in the output quality.

m-toman commented 6 years ago

Finally found the time for some experiments, describing the process here:

(Labels were aligned using the HTK forced_alignment.py coming with Merlin)

Got the current Merlin version from your fork
I used the current version of MagPhase and demos/demo_run_for_merlin/0_batch_feature_extraction_for_merlin.py to extract the features.
test_nan.sh didn't report any NaNs this time
Copied the features and according labels into the slt_arctic full magphase demo experiments folder, generated file lists etc.
Adapted 01_setup.sh by: echo "Train=4084" >> $global_config_file echo "Valid=300" >> $global_config_file echo "Test=100" >> $global_config_file
Ran run_full_voice.sh

It seems now that it couldn't convert a single label file (all files ended up in the crashlist)

felipeespic commented 6 years ago

@m-toman

Hi, I'm sorry for the delay, I have been very busy and I forgot to answer this. I hope you are still testing with MagPhase. Thank you for describing the process, and I have three comments on this:

I think that Merlin is looking for the files in the wrong places. The original labels should be placed in the directory ./acoustic_model/data/label_state_align and the features in ./acoustic_model/data/acoustic_feats.
I am working right now in a constant rate MagPhase version, which would be pushed soon. So, label conversion will not be needed any more.
The recordings sound strange (pre-ringing), like if they were passed through a comb filter or resynthesised with OLA, for example.

m-toman commented 6 years ago

@felipeespic Completely understand, unfortunately I didn't find the time for further testing either, but I'll certainly try the whole process again once you pushed the new version.

Regarding the recordings: they have been made with consumer hardware at home, so they're not studio quality, no professional equipment and no voice talent... Unfortunately I can't remember if this was a version dereverbed and denoised with postfish.

dreamk73 commented 6 years ago

I have let this rest for a bit as well. But once you push the new version I'll be happy to test it again with our professional studio quality recordings.

On Wed, Feb 7, 2018 at 9:39 AM Markus Toman notifications@github.com wrote:

@felipeespic https://github.com/felipeespic Completely understand, unfortunately I didn't find the time for further testing either, but I'll certainly try the whole process again once you pushed the new version.

Regarding the recordings: they have been made with consumer hardware at home, so they're not studio quality, no professional equipment and no voice talent... Unfortunately I can't remember if this was a version dereverbed and denoised with postfish.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/felipeespic/merlin/issues/1#issuecomment-363696006, or mute the thread https://github.com/notifications/unsubscribe-auth/ASbibOplHeyfACnkCwfSifC9B-9eVrJ9ks5tSWE9gaJpZM4Qe1EP .

felipeespic / merlin

Several errors raised when running ./misc/scripts/vocoder/magphase/extract_features_for_merlin.py #1