coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
http://coqui.ai
Mozilla Public License 2.0
31.64k stars 3.78k forks source link

[Bug] VITS gpu utilization #3710

Closed maryawwm closed 2 days ago

maryawwm commented 2 months ago

Describe the bug

im training VITS model (Persian and English language) my dataset is consists of audio clips from 1 to 25s.Im training it on a A100 GPU but most of the time gpu memory is not even half and its utilization is not as i expect.

Screenshot 2024-04-28 091235

To Reproduce

i modified my code based on this script in coqui library:

https://github.com/coqui-ai/TTS/blob/dev/recipes/multilingual/vits_tts/train_vits_tts_phonemes.py

and these are the parameters that i set: audio_config = VitsAudioConfig( sample_rate=16000, win_length=1024, hop_length=256, num_mels=80, mel_fmin=0, mel_fmax=None, )

vitsArgs = VitsArgs( use_language_embedding=True, embedded_language_dim=2, use_speaker_embedding=True, use_sdp=False, )

config = VitsConfig( model_args=vitsArgs, audio=audio_config, run_name="A6_vits_multi_language_10_spk_5_ordibehesht", use_speaker_embedding=True, batch_size=48, eval_batch_size=32, batch_group_size=128, num_loader_workers=12, num_eval_loader_workers=8, precompute_num_workers=12, run_eval=True, test_delay_epochs=-1, epochs=1000, text_cleaner="multilingual_cleaners", use_phonemes=True, phoneme_language=None, phonemizer="multi_phonemizer", phoneme_cache_path=os.path.join(output_path, "phoneme_cache"), compute_input_seq_cache=True, print_step=25, use_language_weighted_sampler=True, print_eval=False, mixed_precision=True, output_path=output_path, datasets=dataset_config, cudnn_enable=True, cudnn_benchmark=True, cudnn_deterministic=True

Expected behavior

higher gpu utilization and faster training time

Logs

one of my steps log:

[1m   --> TIME: 2024-04-27 09:15:52 -- STEP: 124/3006 -- GLOBAL_STEP: 1750125
     | > loss_disc: 2.7141058444976807  (2.7415779617524914)
     | > loss_disc_real_0: 0.2915174067020416  (0.22191733380238854)
     | > loss_disc_real_1: 0.2596714198589325  (0.2545961029827594)
     | > loss_disc_real_2: 0.25090914964675903  (0.2519173812601836)
     | > loss_disc_real_3: 0.2509034276008606  (0.2488831561659612)
     | > loss_disc_real_4: 0.2618330121040344  (0.24871416005396074)
     | > loss_disc_real_5: 0.23049794137477875  (0.2413994044726414)
     | > loss_0: 2.7141058444976807  (2.7415779617524914)
     | > grad_norm_0: tensor(2.3359, device='cuda:0')  (tensor(4.0910, device='cuda:0'))
     | > loss_gen: 1.8159717321395874  (1.9762149626208896)
     | > loss_kl: 5.008370399475098  (42.11719334894611)
     | > loss_feat: 1.7703579664230347  (2.0269679972721693)
     | > loss_mel: 30.50223731994629  (41.7430907526324)
     | > loss_duration: 9.647953033447266  (2.5745641668477357)
     | > amp_scaler: 256.0  (509.9354838709682)
     | > loss_1: 48.74489212036133  (90.438032304087)
     | > grad_norm_1: tensor(73.7072, device='cuda:0')  (tensor(215.5241, device='cuda:0'))
     | > current_lr_0: 0.0002 
     | > current_lr_1: 0.0002 
     | > step_time: 5.8922  (3.467874986510123)
     | > loader_time: 0.006  (0.005929248948251048)

Environment

- TTS version : 0.17.8
- python : 3.9.18
- pytorch : 2.1.1
- os : Linux
- gpu : A100

Additional context

No response

stale[bot] commented 3 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.