Emotional-Text-to-Speech / dl-for-emo-tts

:computer: :robot: A summary on our attempts at using Deep Learning approaches for Emotional Text to Speech :speaker:
MIT License
408 stars 44 forks source link

Update to tensorflow 2 & numpy & others #8

Open YaraAlkaka opened 4 months ago

YaraAlkaka commented 4 months ago

to run the colab code successfully:

  1. run the first cell only
  2. go to /content/tacotron_pytorch/hparams.py and change it to this:
    
    import tensorflow as tf
    import types

Default hyperparameters:

hparams_dict = {

Comma-separated list of cleaners to run on text prior to training and eval. For non-English

# text, you may want to use "basic_cleaners" or "transliteration_cleaners" See TRAINING_DATA.md.
'cleaners': 'english_cleaners',
'use_cmudict': False,  # Use CMUDict during training to learn pronunciation of ARPAbet phonemes

# Audio:
'num_mels': 80,
'num_freq': 1025,
'sample_rate': 20000,
'frame_length_ms': 50,
'frame_shift_ms': 12.5,
'preemphasis': 0.97,
'min_level_db': -100,
'ref_level_db': 20,

# Model:
# TODO: add more configurable hparams
'outputs_per_step': 5,
'padding_idx': None,
'use_memory_mask': False,

# Data loader
'pin_memory': True,
'num_workers': 2,

# Training:
'batch_size': 32,
'adam_beta1': 0.9,
'adam_beta2': 0.999,
'initial_learning_rate': 0.002,
'decay_learning_rate': True,
'nepochs': 1000,
'weight_decay': 0.0,
'clip_thresh': 1.0,

# Save
'checkpoint_interval': 5000,

# Eval:
'max_iters': 200,
'griffin_lim_iters': 60,
'power': 1.5,              # Power to raise magnitudes to prior to Griffin-Lim

}

Convert the dictionary to a namespace

hparams = types.SimpleNamespace(**hparams_dict)

def hparams_debug_string(): hp = [' %s: %s' % (name, hparams[name]) for name in sorted(hparams)] return 'Hyperparameters:\n' + '\n'.join(hp)


3. go to /content/tacotron_pytorch/lib/tacotron/util/audio.py and change ```np.complex``` in line 70 to ```complex```
4. go to /content/pytorch-dc-tts/datasets/emovdb.py line 45 and change ```np.long``` to  ```np.int64```
5. go to /content/pytorch-dc-tts/audio.py line 61 and change it to

return librosa.istft(spectrogram, hop_length=hp.hop_length, win_length=hp.win_length, window="hann")

line 47 to

est = librosa.stft(X_t, n_fft=hp.n_fft, hop_length=hp.hop_length, win_length=hp.win_length)


6. remove ```%tensorflow_version 1.x``` from the second cell in the colab

now it works although Amused emotion is not sounding correct but i'll update this when i fix it
ruobingli1103 commented 3 months ago

Just wanted to say a huge thanks for sharing this!