LSimon95 / megatts2

Unoffical implementation of Megatts2
MIT License
256 stars 34 forks source link

Don't see GE mentioned in the paper #11

Open skysbird opened 7 months ago

skysbird commented 7 months ago

https://github.com/LSimon95/megatts2/blob/c9ca2a88febf9db2cf4d8da0860efc9948db2b76/modules/mrte.py#L63

image

paper:https://arxiv.org/abs/2307.07218

reference: image

LSimon95 commented 7 months ago

GE was removed from the current version to test core MRTE's performance and I can't find the exact structure of GE. Maybe I will add to the newer 24k version for comparison.

skysbird commented 7 months ago

GE was removed from the current version to test core MRTE's performance and I can't find the exact structure of GE. Maybe I will add to the newer 24k version for comparison.

i found timbre encoder description in paper:https://arxiv.org/pdf/2306.03509.pdf (megatts).

i think this can be for your reference:

image

fighting-zeng commented 3 months ago

Does anyone know what the difference is between version 1 and version 4 of the paper 'MEGATTS2' on arXiv? I am really confused. The structure of MEGATTS2 differs between v1 and v4. In v4, the prompt's Conditions of PLLM only use Zc, whereas in v1, it uses Hct. Does this mean that timbre information is no longer needed? Additionally, v4 does not mention GE. Does this mean that GE is not important? v1: image image v4: image image