Open ysig opened 4 years ago
Hi, thanks for your interest in our work and for pointing this out :-), I'll update soon.
Hi,
great project but the caveats make it pretty hard to use. I've documented my install process here (hopefully completely): https://github.com/otezun/WGANSing-Personal-Install-Notes/blob/main/README.md
Checking the training data, it's the phonemes including it's corresponding time stamp, so I assume the .lab file it generates voice from would be formatted as:
<starttime>-<endtime> <phenome>
Is that assumption correct? As example taken from training data:
0.000000 7.089788 sil
I think this would have great potential with better documentation.
Hi Otezun and ysig,
Thanks for you interest in our work and the documentation, looks great. I'll update the main README with your suggestions.
Thanks for the quick answer @pc2752. I have been able to train the 950 epochs, taking a bit more than 12 hours on my machine (MSI 1660 Armor OC 6GB, Ryzen 5 3600, 64GB RAM). I used the nus_ZHIY_sing_06 and translated it to MPOL. The resulting files can be downloaded from my repo, along with the figure. Here is another thing: When synthesizing, it vocodes the output to val_dir_synth, but the filenames do not include information what was vocoded to what. Instead of file names like nus_ZHIY_06.ouput, a name like nus_ZHIY_06_MPOL.output would be better as you wouldn't accidentally overwrite any previous files you have made of that singer (rather, you'd overwrite files where you already combined that singer with that particular subject). Next, it states in the README that it expects a .lab file. This is not true. Instead it expects a .hdf5 file from the dataset. I know this as I tested it against the .lab file from the torch_npss repository, which it would not allow me to do.
hi, to generate singing voice, it expects a .hdf5 file from the dataset. Generated .hdf5 needs wave file, Can it not use wave files?
Hi,
I would like to make some minor comments on this repo:
voice_dir = ../ss_synthesis/voice/
Thanks in advance!!