NVIDIA / waveglow

A Flow-based Generative Network for Speech Synthesis
BSD 3-Clause "New" or "Revised" License
2.29k stars 530 forks source link

Training different 'n_mel_channels' models #264

Open anitaweng opened 2 years ago

anitaweng commented 2 years ago

Hi, I tried to train my own model with different 'n_mel_channels', i.e. 160 instead of 80. Since my autoencoder's output dimension is 160. The other settings are the same as the original setting. However, I got the voice with little noise. My config.json:

{
    "train_config": {
        "fp16_run": true,
        "output_directory": "checkpoints_160",
        "epochs": 100000,
        "learning_rate": 1e-4,
        "sigma": 1.0,
        "iters_per_checkpoint": 10000,
        "batch_size": 12,
        "seed": 1234,
        "checkpoint_path": "",
        "with_tensorboard": true
    },
    "data_config": {
        "training_files": "train_files.txt",
        "segment_length": 16000,
        "sampling_rate": 22050,
        "filter_length": 1024,
        "hop_length": 256,
        "win_length": 1024,
        "mel_fmin": 0.0,
        "mel_fmax": 8000.0, 
        "n_mel_channels": 160
    },
    "dist_config": {
        "dist_backend": "nccl",
        "dist_url": "tcp://localhost:54321"
    },

    "waveglow_config": {
        "n_mel_channels": 160,
        "n_flows": 12,
        "n_group": 8,
        "n_early_every": 4,
        "n_early_size": 2,
        "WN_config": {
            "n_layers": 8,
            "n_channels": 256,
            "kernel_size": 3
        }
    }
}

The outputs of n_mel_channels=80 The outputs of n_mel_channels=160 Is there any idea about how to improve the quality?

msalhab96 commented 2 years ago

Do you use the denosier?

If no try to use the denoiser during inference

anitaweng commented 2 years ago

Thanks, I will try it.