speaker adaptation demo

ecooper7 commented 6 years ago

FIrst, the download_data.sh script referenced in the README is not present, but I was able to obtain the VCTK corpus from this page http://homepages.inf.ed.ac.uk/jyamagis/page3/page58/page58.html

Then, when I got to step ./05_train_duration_model.sh I got the following error:

2017-09-20 18:50:15,687 INFO     main.train_DNN: building the model
2017-09-20 18:50:16,082 INFO     main.train_DNN: load parameters from existing model: '_'
2017-09-20 18:50:16,082 CRITICAL       main    : train_DNN threw an exception
Model file '_' does not exist
Lock freed

This appears to be the same error that people were having a few days ago with the basic SLT demo. Did I miss editing something in a config file, or any other pointers for getting this to work?

ronanki commented 6 years ago

Did you update the recipe directory egs/speaker_adaptation?

The last couple of commits fixed this bug and updated the conf files as well. The init_model which is set to pre-trained model should be used only when it is not equal to '_' as shown here.

So, while preparing average model, we expect that if statement to return False.

ecooper7 commented 6 years ago

I am able to step through the entire adaptation demo now, thanks!!

On Fri, Sep 22, 2017 at 11:11 AM, Srikanth Ronanki <notifications@github.com

wrote:

Did you update the recipe directory egs/speaker_adaptation?

The last couple of commits fixed this bug and updated the conf files as well. The initmodel which is set to pre-trained model should be used only when it is not equal to '' as shown here https://github.com/CSTR-Edinburgh/merlin/blob/master/src/run_merlin.py#L258 .

So, while preparing average model, we expect that if statement to return False.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/CSTR-Edinburgh/merlin/issues/246#issuecomment-331474771, or mute the thread https://github.com/notifications/unsubscribe-auth/ABReNijNm9KhmxOsod-HY7Ox6kqDFJVOks5sk84JgaJpZM4Pelhd .

ronanki commented 6 years ago

@ecooper7 Thank you for testing the pipeline...did you use LHUC as well?

ecooper7 commented 6 years ago

I've only tried it so far with the default settings (fine_tune adaptation method).

On Wed, Oct 4, 2017 at 11:55 AM, Srikanth Ronanki notifications@github.com wrote:

@ecooper7 https://github.com/ecooper7 Thank you for testing the pipeline...did you use LHUC https://github.com/CSTR-Edinburgh/merlin/blob/master/egs/speaker_adaptation/s1/conf/general/acoustic_demo.conf#L98 as well?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CSTR-Edinburgh/merlin/issues/246#issuecomment-334202956, or mute the thread https://github.com/notifications/unsubscribe-auth/ABReNkuNL8lFDTDwUnNfygVgom7heemVks5so6ptgaJpZM4Pelhd .

ecooper7 commented 6 years ago

hi again,

I tried it again just now specifying lhuc as the adaptation method in step 8, and then in step 12_adapt_duration_model.sh I get an error:

2017-10-05 15:26:35,348 CRITICAL main : train_DNN threw an exception Traceback (most recent call last): File "/local/users/ecooper/merlin/src/run_merlin.py", line 1224, in main_function(cfg) File "/local/users/ecooper/merlin/src/run_merlin.py", line 820, in main_function cmp_mean_vector = cmp_mean_vector, cmp_std_vector = cmp_std_vector,init_dnn_model_file=cfg.start_from_trained_model) File "/local/users/ecooper/merlin/src/run_merlin.py", line 245, in train_DNN dropout_rate = dropout_rate, optimizer = cfg.optimizer, rnn_batch_training = cfg.rnn_batch_training) File "/local/users/ecooper/merlin/src/models/deep_rnn.py", line 93, in init hidden_layer = SigmoidLayer_LHUC(rng, layer_input, input_size, hidden_layer_size[i], activation='tanh', p=self.dropout_rate, training=self.is_train) File "/local/users/ecooper/merlin/src/layers/lhuc_layer.py", line 54, in init self.output = activation(self.output) TypeError: 'str' object is not callable Lock freed

The only thing I did differently was specifying lhuc in step 8, is there anything else I should be changing along the way?

bajibabu commented 6 years ago

Thanks for trying out the speaker adaptation demo. You don't need to change anything else to run the LHUC adaptation method, just specify lhuc in step 8. This error caused by this line https://github.com/CSTR-Edinburgh/merlin/blob/master/src/models/deep_rnn.py#L93. Please replace ```activation=T.tanh''' in that line. I will change this and do a PR.

ecooper7 commented 6 years ago

thanks! with that fix I was able to run the LHUC adaptation successfully.

it looks like there is also an "aux" adaptation method available -- do this and "fine_tune" correspond to the i-vector and feature transformation methods described in the paper?

On Fri, Oct 6, 2017 at 4:21 AM, Bajibabu Bollepalli < notifications@github.com> wrote:

Thanks for trying out the speaker adaptation demo. You don't need to change anything to run the LHUC adaptation method, just specify lhuc in step 8. This error caused by this line https://github.com/CSTR- Edinburgh/merlin/blob/master/src/models/deep_rnn.py#L93. Please replace
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<https://github.com/CSTR-Edinburgh/merlin/issues/246#issuecomment-334690544>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABReNrrTSyfsBZ0YCtrvsJHk-d6kQBdPks5speL0gaJpZM4Pelhd>
.

ronanki commented 6 years ago

The paper presented adaptation at three different levels.

fine_tune corresponds to the i-vector.
LHUC is model-based adaptation.
Feature transformation was done using voice conversion with GMMs (VC-GMM).

As of now, fine_tune retrains all layers in the network. May be it's better to freeze some bottom layers and adapt/retrain only top 1 or 2 layers.

bajibabu commented 6 years ago

The conventions I followed were based on this paper https://users.aalto.fi/~bollepb1/papers/icassp_2017a.pdf

martedva commented 6 years ago

@bajibabu can you tell me, if the LHUC model-based adaptation is constrained?

As i understand it, it is constrained if its amplitudes are re-parameterised, however I'm having a hard time finding this in the code. I have mainly been trying to understand, and find it, in the lhuc_layer.py-file.

Thanks in advance,

bajibabu commented 6 years ago

In the LHUC model-based adaptation we scale the values of hidden nodes by a new set of parameters. The new parameters were learned from the adaptation data only and all other parameters were initialized by a pre-trained model.

On 2 April 2018 at 17:18, Martin Edvardsen notifications@github.com wrote:

@bajibabu https://github.com/bajibabu can you tell me, if the LHUC model-based adaptation is constrained?

As i understand it, it is constrained if its amplitudes are re-parameterised, however I'm having a hard time finding this in the code. I have mainly been trying to understand, and find it, in the lhuc_layer.py-file.

Thanks in advance,

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CSTR-Edinburgh/merlin/issues/246#issuecomment-377953067, or mute the thread https://github.com/notifications/unsubscribe-auth/ABWFDD6CB7P5_9682tc6qCl2mCyEe1T9ks5tkkE0gaJpZM4Pelhd .

-- Bajibabu bajibabu.com

simonkingedinburgh commented 6 years ago

Bajibabu Bollepalli wrote:

In the LHUC model-based adaptation we scale the values of hidden nodes by a new set of parameters. The new parameters were learned from the adaptation data only and all other parameters were initialized by a pre-trained model.

the journal paper about LHUC:

P. Swietojanski, J. Li, and S. Renals. Learning hidden unit contributions for unsupervised acoustic model adaptation./IEEE/ACM Transactions on Audio, Speech, and Language Processing/, 24(8):1450-1463, August 2016.

and its application to speech synthesis:

Zhizheng Wu, Pawel Swietojanski, Christophe Veaux, Steve Renals, and Simon King. A study of speaker adaptation for DNN-based speech synthesis. In/Interspeech/, 2015.

open access versions of both papers are available from http://www.cstr.ed.ac.uk/publications/

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

martedva commented 6 years ago

Perfect, thanks guys!

Quick question. @bajibabu and @simonkingedinburgh, do you have any overview of which files, you have edited for each speech adaptation method, LHUC, I vector and FST.

This would be very helpful, as I am implementing a new speech adaptation technique in my bachelors.

Thanks in advance!

LabaBL commented 6 years ago

Hi I'm currently writing my bachelor thesis on speaker adaptation, and as such I'm looking into the speaker adaptation techniques implemented in Merlin. I have been looking at the implementation of the fine_tune method, but have some questions. @ronanki you said in an earlier post that the fine_tune method corresponded to the i-vector method described in A study of speaker adaptation for DNN-based speech synthesis. However, when I'm looking through the code I can't seem to find where the input should be augmented with i-vectors when training an average voice model. Am I overlooking something? Or does the fine_tune method actually not correspond to an implementation of i-vectors? In that case, is there any litterature to be found describing the fine_tune implementation in further detail?

Best regards, Lauritz

CSTR-Edinburgh / merlin

speaker adaptation demo #246