Open ghost opened 3 years ago
hi, to generate singing voice, it expects a .hdf5 file from the dataset. Generated .hdf5 needs wave file, Can it not use wave files?
I have the same experience with you, where sp_to_mfsc
in data preparation and mfsc_to_mgc
in inference is time consuming, and some hyper-parameter like 0.45 in sp_to_mfcs
also make me confused. Maybe melGAN can work better than mfsc_to_mgc
to convert spectrum into signal.
When I've call the first time '
prep_data_nus.py
', i've notice the long preprocessing time to generate the hdf5. Approximatly 3 hours on my computer to generate the 96 hdf5 files. I've notice thesp_to_mgc
performance bottleneck (SPTK dependency).To produce a 2m54 song (the Elton John one from the NUS database), my computer need more than 13 minutes. 10 minutes more than the song duration. I've think that it's because I call the model on my CPU (not GPU), but i've do some measurements and found that the problem is clearly not the model and the 'AI' part.
The inference call:
The
test_file_hdf5_no_question
is just the same astest_file_hdf5
without the questions, but with function timing measurment and only the synthesized audio generated (not the ground truth)The timing result (in seconds)
Clearly, the AI part is very fast, even on CPU. The problem come from the audio regeneration.
Details of
feats_to_audio
calls (always in seconds)The PyWorld synthesize call is acceptable with 25 seconds (14% of the global audio duration), but the SPTK call is not.
Sadly, to my knowledge, this is the only fast code (C code) to generate Mel-Generalized Cepstrum conversion. And this is not a question of GPU because this is a pure CPU code. What the hell with this algorithm ?!?
I know my computer is a oldskool one : Dell Workstation T7400 with an Intel Xeon 4 cores @ 2.33GHz and 16GB RAM. But it works very well for many things except the pure Deep Learning stuffs.
I don't know if something it's possible in the future with WGANSing because the MGC is in the heart of the project, but I will investigate to find a way to optimize this process. I'm sure it's possible to reduce the computation time with some tricks.
In any case, well done with WGANSing, love that kind of project !