TensorSpeech / TensorFlowTTS

:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
https://tensorspeech.github.io/TensorFlowTTS/
Apache License 2.0
3.82k stars 812 forks source link

Increased time to create pre-training data #119

Closed rgzn-aiyun closed 4 years ago

rgzn-aiyun commented 4 years ago

Hi, I have forgotten to update too late recently, just pulled the latest code and ran a bit, and found that the generated data time has increased? It's been 10 minutes and it hasn't finished generating, why?

tensorflow-tts-preprocess --rootdir ./datasets/ --outdir ./dump/ --conf preprocess/ljspeech_preprocess.yaml
[Preprocessing]: 6% 625/10000 [10:48<2:42:07, 1.04s/it] [Preprocessing]: 6% 625/10000 [11:29<2:52:16, 1.10s/it] [Preprocessing]: 6% 625/10000 [11:47<2:56:51, 1.13s/it] [Preprocessing]: 6% 625/10000 [12:29<3:07:17, 1.20s/it]

dathudeptrai commented 4 years ago

@rgzn-aiyun because the new version calculate f0/energy for FastSpeech2 and it's slow :D

rgzn-aiyun commented 4 years ago

@rgzn-aiyun because the new version calculate f0/energy for FastSpeech2 and it's slow :D

Half an hour has passed, and the pretreatment has not been completed. Is there any way to speed up the generation?

dathudeptrai commented 4 years ago

@rgzn-aiyun the preprocessing already use multi-process, for ljspeech it need 10-15p to calculate. I think it's normal and you just need calculate once time :D.

rgzn-aiyun commented 4 years ago

@rgzn-aiyun the preprocessing already use multi-process, for ljspeech it need 10-15p to calculate. I think it's normal and you just need calculate once time :D.

I estimate that it will take at least an hour to generate, which is unacceptable speed!

dathudeptrai commented 4 years ago

@rgzn-aiyun why it's unacceptable speed since you just need calculate once time ?. BTW, the speed slow cause by https://github.com/JeremyCCHsu/Python-Wrapper-for-World-Vocoder used for calculate F0. I think the nature of it is slow

rgzn-aiyun commented 4 years ago

@rgzn-aiyun why it's unacceptable speed since you just need calculate once time ?. BTW, the speed slow cause by https://github.com/JeremyCCHsu/Python-Wrapper-for-World-Vocoder used for calculate F0. I think the nature of it is slow

Yes, because it is cloud computing, it can't waste too much time

dathudeptrai commented 4 years ago

@rgzn-aiyun so maybe you need to pre-calculate local then upload to cloud ?, or you can ignore to calculate f0/energy (in case you don't training fastspeech2) by using the old preprocessing files.

rgzn-aiyun commented 4 years ago

@rgzn-aiyun so maybe you need to pre-calculate local then upload to cloud ?, or you can ignore to calculate f0/energy (in case you don't training fastspeech2) by using the old preprocessing files.

This is indeed a solution.

rgzn-aiyun commented 4 years ago

Something went wrong?

Traceback (most recent call last): File "train_fastspeech2.py", line 515, in main() File "train_fastspeech2.py", line 441, in main return_utt_id=False, File "/ai/TensorflowTTS/examples/fastspeech2/fastspeech2_dataset.py", line 111, in init duration_files = [duration_files[idx] for idx in idxs] File "/ai/TensorflowTTS/examples/fastspeech2/fastspeech2_dataset.py", line 111, in duration_files = [duration_files[idx] for idx in idxs] IndexError: list index out of range

rgzn-aiyun commented 4 years ago

@dathudeptrai Doesn't FastSpeech2 need to extract the duration?

dathudeptrai commented 4 years ago

@rgzn-aiyun yes, fastspeech2 need extract durations, the different between fastspeech and fastspeech2 is fastspeech2 use f0/energy

rgzn-aiyun commented 4 years ago

@rgzn-aiyun yes, fastspeech2 need extract durations, the different between fastspeech and fastspeech2 is fastspeech2 use f0/energy

The author's paper does not seem to require extraction

dathudeptrai commented 4 years ago

@rgzn-aiyun they use MFA to extract durations, fastspeech need duration file, this is a "MUST"

ZDisket commented 4 years ago

@rgzn-aiyun As @dathudeptrai said, they use MFA to get rid of the teacher model and extract durations. You can either extract durations from Tacotron or head over to this version which supports MFA and phonetic training. For seeing its performance, you can play with my model at this notebook, although that one's only at 40k since I just turned on rounded durations.

manmay-nakhashi commented 4 years ago

@dathudeptrai @ZDisket can we use CTC decoder like concept to get rid of Durations ?? :thinking:

dathudeptrai commented 4 years ago

@manmay-nakhashi yes, any asr algorithm can be use to extract duration.

manmay-nakhashi commented 4 years ago

@dathudeptrai so deepspeech uses CTC loss for alignment , if we use CTC loss and integrate it with fastspeech can we eliminate duration calculation step from tacotron2 ?

dathudeptrai commented 4 years ago

@manmay-nakhashi yes, you can use MFA also, i will intergrated MFA to the repo soon

manmay-nakhashi commented 4 years ago

@dathudeptrai sure :smile:

ZDisket commented 4 years ago

@dathudeptrai

yes, you can use MFA also, i will intergrated MFA to the repo soon

When?

dathudeptrai commented 4 years ago

@ZDisket after merge multi-gpu branch. BTW, do you want to make a PR ?

ZDisket commented 4 years ago

@dathudeptrai I will once I finish training the current model and if I like it. Keep in mind my code is messy and I have no idea what I'm doing 90% of the time.

rgzn-aiyun commented 4 years ago

@dathudeptrai

Recovery training failed? : Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?

dathudeptrai commented 4 years ago

@rgzn-aiyun can you share ur command line to recovery training ?

dathudeptrai commented 4 years ago

@rgzn-aiyun

--resume ./model/checkpoints/ckpt-15000

I will update the readme so it won't confuse anymore :D

rgzn-aiyun commented 4 years ago

@rgzn-aiyun can you share ur command line to recovery training ?

CUDA_VISIBLE_DEVICES=0 python examples/fastspeech2/train_fastspeech2.py \ --train-dir ./dump/train/ --dev-dir ./dump/valid/ --outdir ./model/ --config ./examples/fastspeech2/conf/fastspeech2.v2.yaml --use-norm 1 --f0-stat ./dump/stats_f0.npy --energy-stat ./dump/stats_energy.npy --mixed_precision 1 --resume ./model/checkpoints/model-15000.h5

rgzn-aiyun commented 4 years ago

@rgzn-aiyun

--resume ./model/checkpoints/ckpt-15000

I will update the readme so it won't confuse anymore :D

ok, I will try.

rgzn-aiyun commented 4 years ago

@dathudeptrai An error is reported when saving the model for a period of time after resuming training?

/checkpoints/checkpoint.tmp0030f793c7724ccaa4e3bed038d04f81; Permission denied

machineko commented 4 years ago

@rgzn-aiyun chmod -R 777 checkpoints/ Also Preprocessing calculation is bugged I'll make pr with fix for time calculation and a lot more stuff in next few days.