MasayaKawamura / MB-iSTFT-VITS

Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform
Apache License 2.0
401 stars 64 forks source link

Single frequency noise #9

Open MaxMax2016 opened 1 year ago

MaxMax2016 commented 1 year ago

istft_mb

MaxMax2016 commented 1 year ago

the lines in the spectrum,have you meet this problem?

MasayaKawamura commented 1 year ago

Hi @MaxMax2016 I think such a phenomenon did not appear under the experimental conditions in the paper (audio demo page here). Are the data sets and experimental conditions the same as in the paper?

MaxMax2016 commented 1 year ago

i use 16k wav,and this is the only difference. Do you know why?

Jackiexiao commented 1 year ago

@MaxMax2016 I have the same issue when I change to 16khz, but after enough steps (140k steps for me with fp16 enabled and batch size 64 & mini mbistft), this noise disappear

MaxMax2016 commented 1 year ago

@Jackiexiao TKS

atozto9 commented 1 year ago

@MaxMax2016 I have the same issue when I change to 16khz, but after enough steps (140k steps for me with fp16 enabled and batch size 64 & mini mbistft), this noise disappear

Hi, could you share your configuration? I have the same problem.

JohnHerry commented 10 months ago

@MaxMax2016 I have the same issue when I change to 16khz, but after enough steps (140k steps for me with fp16 enabled and batch size 64 & mini mbistft), this noise disappear

Hi, Jackiexiao, have you removed the weight_norm in the code ? I have tried to enable the fp16 too, but the training process stoped at loss backup on the weight norm node.