Multi-voice singing voice synthesis
235 stars 44 forks source link

Two problems about this repo. #36

Open YingjieSong1 opened 3 years ago

YingjieSong1 commented 3 years ago


I find two questions on this repo that need your help:

(1) The model weights you provided is 950 epoch. However, the network you referred in the paper is trained for 3000epoch. Could you please provide the model which can generate the same audio as the samples from ""? I used the model about 950 epoch for inference and the generated audio about "JLEE" and "MCUR" are worse than the samples. By the way, could you please tell me which identity you converse "JLEE" and "MCUR" to in the webpage?

(2) In the paper WGANSing, you propose to adjust the input f0 by an octave to account for the different ranges of the genders. However, I can't find the corresponding code during inference. I try the results generated by the model you provided, which I think the effect of gender change is not good. I have attached the relevant audio. "SAMF_original.wav" is the the audio from dataset("\nus-smc-corpus_48\SAMF\01.wav"). "SAMF_output.wav" is the generated audio from "nus_MCUR_sing_04.hdf5". could you please tell me how to adjust the input f0 or provide the corresponding code?

Thanks in advance!!