KunZhou9646 / Emovox

This is the implementation of the paper "Emotion Intensity and its Control for Emotional Voice Conversion".
78 stars 11 forks source link

questions about dataloader and relative attributes. #1

Open Kanraaaaa opened 2 years ago

Kanraaaaa commented 2 years ago

Hi Kun, I have read this paper and tried to train this network, but I meet some questions as follows:

  1. I cant find the implementation of TextMelIDLoader and TextMelIDCollate, and I want to know whether the strength_embedding is the result of Learned Ranking Function, a value in [0, 1] which indicates the emotional intensity?
  2. In Formula (6), why not set a lower bound but an upper bound? image
  3. About DDUR,how can we calculate the DDUR between two sentences of different liguistic content? duration / num_of_words?

Look forward your kind reply! Thank you:)

KunZhou9646 commented 2 years ago

Hi Thanks for your questions! 1/ Yes the strenght_embedding is predicted by learned ranking function. 0- most weak; 1-most strong; 2/ Thanks a lot!! I have corrected the uppder bound to lower bound in the published version; 3/ We are using ESD which is a parallel emotional speech database. We use the reference speech with the same linguistic content for evaluation. We calculate the duration of voiced speech.

hamgcho commented 1 year ago

@KunZhou9646

Hello, sorry for the interruption but I want to know where I can find the implementation of TextMelIDLoader and TextMelIDCollate, which were supposed to be located under codes/reader, but they don't exist.

And thank you for sharing your great work! I am really enjoying it :)

hamgcho commented 1 year ago

Can I use the implementation in nonparaSeq2seqVC_code directly? or does it require any modification?