NVIDIA / DeepLearningExamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
13.53k stars 3.23k forks source link

[Help Requested] Getting static with a loss of 3 #1046

Open brentcty-2020 opened 2 years ago

brentcty-2020 commented 2 years ago

Describe the bug Train with 100 epochs, take a snapshot and try the inference with pretrainined waveglow. Validation loss is 3.09584379196167 at the end. But all I get is noise with nothing recognizable.

To Reproduce Just followed the default setup instructions with the LJSpeech-1.1 voice set.

Expected behavior I was hoping to have something at least intelligible with a loss of 3. I must be doing something fundamentally wrong. I've got a much smaller GPU, so have adjusted to keep the training batch size to 256. Anyone have any ideas. Thanks for any help

Environment: Ubuntu 20.04 Container version python3, pytorch 1.9.0a0+2ecb2c7 GPU: Nvidia GeForce RTX 2070 Super Nvidia Driver: 470.86 CUDA: 11.4

First section of my train.sh: export OMP_NUM_THREADS=1

: ${NUM_GPUS:=1} : ${BATCH_SIZE:=8} : ${GRAD_ACCUMULATION:=32} : ${OUTPUT_DIR:="./output"} : ${LOG_FILE:=$OUTPUT_DIR/nvlog.json} : ${DATASET_PATH:=LJSpeech-1.1} : ${TRAIN_FILELIST:=filelists/ljs_audio_pitch_text_train_v3.txt} : ${VAL_FILELIST:=filelists/ljs_audio_pitch_text_val.txt} : ${AMP:=true} : ${SEED:=""}

: ${LEARNING_RATE:=0.1}

Adjust these when the amount of data changes

: ${EPOCHS:=1000}

: ${EPOCHS_PER_CHECKPOINT:=50} : ${WARMUP_STEPS:=1000} : ${KL_LOSS_WARMUP:=100}

Outputs: outputs.zip

alancucki commented 2 years ago

Hi @brentcty-2020 ,

sorry for a late reply. Have you manage to resolve your issue?

After 100 epochs you should get something very intelligibile, just a little bit noisy (sample). I'd look for the problem elsewhere, or try to replicate step-by-step the quick start guide once again.

brentcty-2020 commented 2 years ago

Thank @alancucki for your response. Unfortunately, I have not resolved my issue. Knowing that it should sound intelligible after 100 epochs is helpful though.

I've tried the step by step instructions twice with the same result. Where would suggest I look? Do the waveglow and fastpitch models need to match somehow? hyperparameters etc? I am using the previously build waveglow suggested in your instructions. Thanks again.