Open KakaruHayate opened 4 weeks ago
详见分支:https://github.com/KakaruHayate/DiffSinger/tree/ConvFFT
代码:https://github.com/KakaruHayate/DiffSinger/blob/ConvFFT/modules/commons/common_layers.py
源自:Xiaoicesing 2: A High-Fidelity Singing Voice Synthesizer Based on Generative Adversarial Network
实现参考:https://github.com/CODEJIN/XiaoiceSing2/blob/master/Modules/Modules_Paper.py#L391
1.DiffSinger和Xiaoicesing的fastspeech2 encoder实现本来就有差异(前者使用pre-norm而后者使用post-norm)(1102post-norm会NaN),故ConvFFT的整体实现与所提到论文有差异。其中Conv Block两端都有norm,故差异应该不大,Conv Block与MHSA的和是否需要norm待实验
2.CODEJIN的实现Conv Block是串联的,但按照论文的描述,应该和MHSA是并联的,而官方实现没有提供这一部分,
3.目前默认参数很可能会过拟合
# ConvFFT args: use_conv_block: true # 是否使用ConvFFT conv_block_kernel_size: 5 # Conv Block的卷积核大小,可调参数 conv_block_dropout_rate: 0.1 # Conv Block的dropout率,可调参数 conv_block_dilate: 4 # Conv Block的dim膨胀倍数,可调参数,应该需要调小 conv_block_layer: 2 # Conv Block的数量,可调参数,论文中提到了encoder部分需要两个 conv_block_add_norm: false # 是否对与MHSA相加的结果进行norm
1.post norm会nan
2.原版的实现(pre norm)会严重口胡
3.Diffsinger Encoder + Conv Block疑似提高自然度,待测试
详见分支:https://github.com/KakaruHayate/DiffSinger/tree/ConvFFT
代码:https://github.com/KakaruHayate/DiffSinger/blob/ConvFFT/modules/commons/common_layers.py
源自:Xiaoicesing 2: A High-Fidelity Singing Voice Synthesizer Based on Generative Adversarial Network
实现参考:https://github.com/CODEJIN/XiaoiceSing2/blob/master/Modules/Modules_Paper.py#L391
1.DiffSinger和Xiaoicesing的fastspeech2 encoder实现本来就有差异(前者使用pre-norm而后者使用post-norm)(1102post-norm会NaN),故ConvFFT的整体实现与所提到论文有差异。其中Conv Block两端都有norm,故差异应该不大,Conv Block与MHSA的和是否需要norm待实验
2.CODEJIN的实现Conv Block是串联的,但按照论文的描述,应该和MHSA是并联的,而官方实现没有提供这一部分,
3.目前默认参数很可能会过拟合