Wendison / FCL-taco2

Official implementation of FCL-taco2: Fast, Controllable and Lightweight version of Tacotron2 @ ICASSP 2021
MIT License
39 stars 6 forks source link

'Encoder' object has no attribute 'embed_proj' #1

Closed LYH96 closed 3 years ago

LYH96 commented 3 years ago

nets/knowledge_distillation/e2e_tts_tacotron2_sa_kd_student.py", line 648, in init ,when i run student_model_training.sh ,it tells me 'Encoder' object has no attribute 'embed_proj'.Hope to answer

Wendison commented 3 years ago

Hi, in "nets/knowledge_distillation/e2e_tts_tacotron2_sa_kd_student.py", there are typos in import part,

'from nets.modules.decoder_sa import Decoder' 'from nets.modules.encoder_sa import Encoder' should be 'from nets.modules.decoder_sa_kd import Decoder' 'from nets.modules.encoder_sa_kd import Encoder'

I have uploaded modified file, please try again.

LYH96 commented 3 years ago

a new error "'Decoder' object has no attribute 'lstm_proj'" in decoder_sa_kd.py 482 and in 531 of e2e_tts_tacotron2_sa_kd_student.py ,condition are spelling mistakes.

gongchenghhu commented 3 years ago

@Wendison Thanks for your share of this nice job. I also meet the same problem, looking forward to your reply. Thanks!

Wendison commented 3 years ago

a new error "'Decoder' object has no attribute 'lstm_proj'" in decoder_sa_kd.py 482 and in 531 of e2e_tts_tacotron2_sa_kd_student.py ,condition are spelling mistakes.

@LYH96, @gongchenghhu Thank you for your feedback, I have corrected the spelling erros and updated 'e2e_tts_tacotron2_sa_kd_student.py' and 'e2e_tts_tacotron2_sa_kd_teacher.py' under './nets/knowledge_distillation', and 'decoder_sa_kd.py' under './nets/modules'. Please try again, in case the error Decoder' object has no attribute 'lstm_proj still exisits, please add 'print(self.is_student, self.share_proj)' under the line 478 of 'decoder_sa_kd.py' to see the print information, if both them are true, this error should not exist.

gongchenghhu commented 3 years ago

@Wendison Thanks for you quickly reply. And there are still some spelling errors like "items --> item" in https://github.com/Wendison/FCL-taco2/blob/3b25cf47fd952c860949d31af492e4c51328fcdb/nets/knowledge_distillation/e2e_tts_tacotron2_sa_kd_student.py#L785, I corrected some spelling errors by myself (I hope what I did is right),but now I still meet a new problem, like the below figure: image I am really interesting about your work. I would be grateful if you could solve this problem.

Wendison commented 3 years ago

@Wendison Thanks for you quickly reply. And there are still some spelling errors like "items --> item" in

https://github.com/Wendison/FCL-taco2/blob/3b25cf47fd952c860949d31af492e4c51328fcdb/nets/knowledge_distillation/e2e_tts_tacotron2_sa_kd_student.py#L785

, I corrected some spelling errors by myself (I hope what I did is right),but now I still meet a new problem, like the below figure: image I am really interesting about your work. I would be grateful if you could solve this problem.

@gongchenghhu , yes, you're right! 'loss.items()' should be 'loss.item()', I have modified the corresponding faults in e2e_tts_tacotron2_sa_kd_student.py. For the error, the issue is the loss calculation between the durations (i.e., d_outs) from student and teacher models, 'd_outs' in e2e_tts_tacotron2_sa_kd_student.py has dimenstion of (batch-size, max-time-steps, 1), while 'd_outs' in e2e_tts_tacotron2_sa_kd_teacher.py has dimension of (batch-size, max-time-steps), I have added one line here to ensure their dimensions are the same. Please use the updated files. If you have any other problems, you're welcome to tell me, thx!

gongchenghhu commented 3 years ago

@Wendison Hi, I have trained the teacher and student model. But the result is unexpected. The result of the teacher model and w/o KD model is good, but the result of FCL-taco2-S is really bad. And I I have attached the FCL-taco2-S result. result.zip Could you tell me what I might have done wrong? Do I have to install the corresponding version of apex correctly?

Wendison commented 3 years ago

@gongchenghhu thanks for your feedbacks, I can't download your 'result.zip', could you upload your file again? One way to judge whether the student model is well trained is to compare the training losses (e.g., mel loss, pitch/energy/duration losses, exclude distillation losses) of student model and teacher model, if they're close (student model has higher losses than teacher model), then the student model should work. BTW, apex is only used to speed up the training, I didn't notice the speech quality difference when apex is used or not.

gongchenghhu commented 3 years ago

@Wendison Thanks for your feedbacks. I upload the result sample and loss plot again. result.zip loss.zip

Wendison commented 3 years ago

@gongchenghhu Sorry for the late response, I'm busy working on other things these days, I will look into your problems very soon.

Wendison commented 3 years ago

@gongchenghhu After checking the code, I found one error in encoder_sa_kd.py https://github.com/Wendison/FCL-taco2/blob/main/nets/modules/encoder_sa_kd.py#L163-L171. The original code is

if self.use_residual:
    xs_conv1 = self.convs[0](xs_conv0) + xs_conv0
else:
    xs_conv1 = self.convs[0](xs_conv0) # B x econv_chans x Tmax

if self.use_residual:
    xs_conv2 = self.convs[0](xs_conv1) + xs_conv1
else:
    xs_conv2 = self.convs[0](xs_conv1) # B x econv_chans x Tmax

where the first convolution is repeatedly used while the 2nd and 3rd convolution layers are not used, it should be:

if self.use_residual:
    xs_conv1 = self.convs[1](xs_conv0) + xs_conv0
else:
    xs_conv1 = self.convs[1](xs_conv0) # B x econv_chans x Tmax

if self.use_residual:
    xs_conv2 = self.convs[2](xs_conv1) + xs_conv1
else:
    xs_conv2 = self.convs[2](xs_conv1) # B x econv_chans x Tmax

I have modified this error, please try again, sorry for the inconvenience.