CSTR-Edinburgh / merlin

This is now the official location of the Merlin project.
http://www.cstr.ed.ac.uk/projects/merlin/
Apache License 2.0
1.31k stars 442 forks source link

alternative vocoders? #261

Open DabiaoMa opened 6 years ago

DabiaoMa commented 6 years ago

Hi,

I am using World vocoder to reroduce wavs but it seems like World maybe not a good choice to produce high quality voices. I extracted acoustic features with World from original wavs then directly synthesized wavs with them. For most of the time the generated wavs are not good.

Maybe I need to modify some parameters of World vocoder like frame shift or frame length, but with more synthesis time cost?

Can we use some other vocal models other than World or Straight?

Deep mind declared that they created a new version of WaveNet that is 1000 times faster than the previous one, maybe the accelerated WaveNet is a good choice?

Tacotron uses a Griffin-Lim algorithm as vocoder, but I do not know whether Griffin-Lim performs better or not.

Best,

ljuvela commented 6 years ago

I recommend trying https://github.com/gillesdegottex/pulsemodel It should be relatively straightforward to integrate into Merlin.

As for WaveNet, the latest DeepMind announcement contained virtually no information on what they have done. Let's wait for a paper. Anyway, most recent papers with WaveNet waveform generator (e.g. Baidu's stuff) use some acoustic features for local conditioning, so let's not give up on the old vocoders yet.

Griffin-Lim is not really a vocoder, but rather a method to create consistent phase information for a magnitude spectrogram. So the quality there depends entirely on how well you can predict magnitude spectra (which is not too easy).

DabiaoMa commented 6 years ago

Thanks, I checked the paper 'Pulse Model in Log-domain for a Uniform Synthesizer', the author concludes that the Comparative mean opinion scores is a little worse than Straight.

Hope Deep mind would publish details soon...

gillesdegottex commented 6 years ago

For PML, using RNN and proper layer output, the results are eventually better, as shown in the journal article: http://gillesdegottex.eu/wp-content/papercite-data/pdf/DegottexG2017pmlj_acceptedversion.pdf (out since a week only). Do not hesitate to ask me some new features for the code (feature extraction and synthesis options), I'll be surely happy to implement them.

Waveform synth is definitely a great solution for quality. The only problem is the technical "details" necessary to run this fast and the current feedback I got in conference about this is: "We can't speak about this". So you might have to wait quite a bit before getting details :(

Happy to read any other solution you find @DabiaoMa

(Thanks @ljuvela !)

fosimoes commented 6 years ago

I was considering using MagPhase, which was recently presented at Interspeech. https://github.com/CSTR-Edinburgh/magphase MagPhase documentation has some guidelines on how to use it with Merlin. It suggests that running some scripts and changing Merlin's config file should be enough. I believe, however, that some modification in source code is necessary, since references to WORLD features (mgc, bap and lf0) are hard-coded into Merlin. Has anyone tried to do it?

dreamk73 commented 6 years ago

Interesting suggestions to check out. @fosimoes, you only need to add some code to merlin/src/configuration/configuration.py, where you will see settings for STRAIGHT and WORLD. You can define a different vocoder there in the same manner. You can define there what kind of parameter directories / files are used by the particular vocoder. I have tried it with GlottHMM and AHOcoder in the past.

DabiaoMa commented 6 years ago

@gillesdegottex Thanks and I will wait for the details

felipeespic commented 6 years ago

@fosimoes Yes, I have tried and works! You just need to add the MagPhase parameters (mag, real, imag and their deltas) in the configuration.py file with the dimensions: mag: 60, dmag: 180, real: 45, dreal: 135, imag: 45, dimag: 135.

I am currently working on the MagPhase-Merlin integration, but for now you can run the scripts manually as you mentioned. That should work.

m-toman commented 6 years ago

@felipeespic Do you need help with the integration? I also planned to try it out.

Do you think MagPhase can be ported to C++ easily? And does it require access to the whole feature vector or can you run it streaming i.e. on chunks/windows?

Great to see so many vocoders coming out now, for years we've been mostly stuck with hts_engine and STRAIGHT ;)

RasmusD commented 6 years ago

MagPhase can be ported to C++ - whether it's easy to do is up to you ;-)

It can be run in individual frame chunks as well - so you can use it in a streaming fashion - if you're doing a C++ implementation simply think that in from the beginning :-)

2017-11-09 12:27 GMT+01:00 Markus Toman notifications@github.com:

@felipeespic https://github.com/felipeespic Do you need help with the integration? I also planned to try it out.

Do you think MagPhase can be ported to C++ easily? And does it require access to the whole feature vector or can you run it streaming i.e. on chunks/windows?

Great to see so many vocoders coming out now, for years we've been mostly stuck with hts_engine and STRAIGHT ;)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/CSTR-Edinburgh/merlin/issues/261#issuecomment-343126837, or mute the thread https://github.com/notifications/unsubscribe-auth/AEyDpxAHS9mz85057dxMHdLGhvgpYX11ks5s0uGvgaJpZM4P4EAj .

felipeespic commented 6 years ago

Hi @m-toman , yes, I think I will need help for testing it. Could you help me with that?

Also, as @RasmusD mentioned, you can implement MagPhase in C++ for streaming. I think that it should be quite simple if you are proficient in C++.

m-toman commented 6 years ago

@RasmusD @felipeespic
Well :)... let's see. I've briefly checked the source and the paper and guess it should be possible for me. I'm not very proficient with signal processing, but guess my C++ is OK (at least I ported the synthesis part of Merlin to C++).

@felipeespic Sure, would be glad to. I'm currently looking at the code and stepping through the merlin-preprocessing steps. Also started adapting the configuration.py. I should be able to test it with a couple of voices with very different levels of quality.

felipeespic commented 6 years ago

Hi @m-toman , I just pushed the slt_arctic demo using MagPhase to the branch in my fork: https://github.com/felipeespic/merlin/tree/magphase_integration It should work out of the box.

Could you test that everything works OK? Also, just if you have time, could you implement the slt_arctic full voice with MagPhase, please?

m-toman commented 6 years ago

@felipeespic OK thanks, should be able to check it out later today and guess should have the time to adapt the "full" version.

EDIT: Could you upload http://felipeespic.com/depot/databases/merlin_demos/slt_arctic_full_data_magphase.zip ?

The demo ran without problems, I'll start to integrate everything into my own scripts, which do an "out of source" build from scratch (so for a given folder of wavs and orthographic transcription). Makes it easier to test it on a dozen voices or so.

dreamk73 commented 6 years ago

@felipeespic when running the script to extract magphase acoustic features with my 48000 Hz audio data, I get an error. Same with 16000.

on line 392 of tools/magphase/src/magphase.py

it says: if (fs != 48000) or (fs != 16000)

The 'or' should be 'and'

And then on line 1657 I get another error but now fs is a list of numbers instead of the single number it was before, so somewhere fs is assigned something else.

dreamk73 commented 6 years ago

tools/magphase/src/magphase.py line 155

m_sp, m_ph, v_shift, m_frms, m_fft = analysis_with_del_comp_from_pm(v_in_sig, v_pm_smpls, nFFT, win_func=win_func, nwin_per_pitch_period=nwin_per_pitch_period)

should be

m_sp, m_ph, v_shift, m_frms, m_fft = analysis_with_del_comp_from_pm(v_in_sig, fs, v_pm_smpls, nFFT, win_func=win_func, nwin_per_pitch_period=nwin_per_pitch_period)

Also, that function only returns two variables and this line has five?

dreamk73 commented 6 years ago

Change the return statement of analysis_with_del_comp_from_pm to have those five variables. Next issue is that the function get_fft_params_from_complex_data is not defined in magphase.py. I did have it in the previous version that you made but c&p it and running analysis gave lots of the same errors:

fft : m must be a integer of power of 2!

felipeespic commented 6 years ago

Hi @dreamk73 , Thank you for pointing out. Those functions do not work, because they are deprecated.

I have moved this to the "Issues" section in my fork (https://github.com/felipeespic/merlin/issues/1), since that code is not part of the Merlin repo yet. Thanks!

PS: Could I remove these comments from here?

ljuvela commented 6 years ago

Details out on the Google's new production WaveNet: https://deepmind.com/blog/high-fidelity-speech-synthesis-wavenet/

felipeespic commented 6 years ago

MagPhase vocoder integration with Merlin done in PR #281