Moon-sung-woo / MIST_Tacotron

BSD 3-Clause "New" or "Revised" License
6 stars 0 forks source link

whether it need a spectrogram of each audio during training? #3

Closed freshwindy closed 1 year ago

freshwindy commented 1 year ago

I found that the style image (style_img) needs to be used in training, but I looked at your data processing code. I found that there was no part about how the image came from, only the code to determine the image path according to the voice path.Does this mean that it is necessary to generate a Mel spectrogram for each audio.

Moon-sung-woo commented 1 year ago

Oh sorry... I will update soon

freshwindy commented 1 year ago

Looking forward to your update.I wonder if you can give me an email address to communicate.Thanks.

Moon-sung-woo commented 1 year ago

Hi @freshwindy. I'm so sorry to rate I update code. See make_mel_style.py. I also update model, see /MIST_tacotron and delete /tacotron2 and then change directory name /MIST_tacotron -> /tacotron2 It is make you easy to train.

freshwindy commented 1 year ago

Hi,Thank you for sharing.I have checked your updated code, but I still have a question, that is, I found your updated make mel Style.py also adds the image style migration model. Is this necessary for image processing? Because I saw this in your model also have style_encoder,which will lead to re-use of image style migration.

Moon-sung-woo commented 1 year ago

Hi, @freshwindy . I use mel-spectrogram feature like image. So we need image precessing. GST tacotron uses token extracted from mel-spectrogram as feature. But MIST tacotron uses image style transfered mel-spectrogram extracted from mel-spectrogram(make_mel_style.py) as feature

Moon-sung-woo commented 1 year ago

Close due to inactivity.