Pre-trained Model and Inference Procedure

Aria-K-Alethia / laughter-synthesis

Official implementation of the paper "Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus" accepted by INTERSPEECH 2023.

MIT License

70 stars 5 forks source link

Pre-trained Model and Inference Procedure #1

Closed ucciicci closed 1 year ago

ucciicci commented 1 year ago

Thank you for publishing such an intriguing piece of code. It's truly an admirable effort.

I have followed the instructions in this repository and managed to train a model on my end. However, the results are not as expected - the output from the model sounds more like noise than a coherent synthesized voice.

To compare the results and further understand the process, I'm interested in a pre-trained model if one is available for public use. Can you provide any details about this?

Moreover, could you share any code or guidelines on how to perform inference using such a pre-trained model? Any information you can provide would be tremendously helpful for refining my understanding and use of this project.

Thank you for your time and consideration. I look forward to your response.

Best,

Aria-K-Alethia commented 1 year ago

Hi,

Could you provide some details about your training? e.g. loss value when training stop. I also wonder if you modified any parameters. This is because I have to figure out if there exists any problem in the code.

As for the pretrained model, I will check if it is possible to use the old checkpoints, or we can just train a new one.

Thank you.

ucciicci commented 1 year ago

I'm so sorry!!!

I want to express my sincere apologies for the confusion caused by my previous post. I realized my mistake - I had not correctly loaded the trained model, which is quite an elementary error on my part. I've since corrected this issue and the model is working as expected now.

I appreciate the hard work that has gone into this project and I regret any inconvenience my oversight might have caused.

By the way, after 20,000 steps of training, the losses are as follows. Can I consider these results to be as expected?

val_loss: 2.906731606 val_mel_loss: 0.757005692 val_mel_postnet_loss: 0.757447839 val_pitch_loss: 0.99861455 val_energy_loss: 0.291485846 val_duration_loss: 0.102177404

Aria-K-Alethia commented 1 year ago

OK. I'm happy to hear you have solved the problem. The loss value looks as expected.