fatchord / WaveRNN

WaveRNN Vocoder + TTS
https://fatchord.github.io/model_outputs/
MIT License
2.14k stars 698 forks source link

Modifications for large number of utterances and reducing data size #205

Closed patrickltobing closed 4 years ago

patrickltobing commented 4 years ago

I was trying with my own dataset for preprocessing and training the WaveRNN only. Following modifications might be necessary for maintaining the training time and reducing the data size:

  1. utils/dataset.py line 53 num_workers is necessary to be adjusted as 1 per ~840 utterances, e.g., if the total data is 8400 or 9000, set to 10

  2. preprocess.py line 47 write quant as np.int16 type to reduce the size by a factor of 4, and to reduce I(/O) time to a lesser extent

  3. dataset.py line 78 label as np.int16; not as critical as the 1st/2nd points