152334H / DL-Art-School

TorToiSe fine-tuning with DLAS
GNU Affero General Public License v3.0
205 stars 86 forks source link

Divide by Zero error #59

Closed demonauthor closed 1 year ago

demonauthor commented 1 year ago

Any idea why this is happening?

=============================================== CUDA SETUP: Loading binary C:\Users\oldgu\miniconda3\envs\DLAS\lib\site-packages\bitsandbytes\libbitsandbytes_cuda116.dll... 23-03-22 20:25:03.393 - INFO: name: Test model: extensibletrainer scale: 1 gpu_ids: [0] start_step: -1 checkpointing_enabled: True fp16: False use_8bit: True wandb: False use_tb_logger: True datasets:[ train:[ name: Test n_workers: 8 batch_size: 1 mode: paired_voice_audio path: C:/Users/oldgu/ozen-toolkit/output/Vincent_AGraveyardofGhostTales.wav_2023_03_22-20_05\train.txt fetcher_mode: ['lj'] phase: train max_wav_length: 255995 max_text_length: 200 sample_rate: 22050 load_conditioning: True num_conditioning_candidates: 2 conditioning_length: 44000 use_bpe_tokenizer: True load_aligned_codes: False data_type: img ] val:[ name: Test n_workers: 1 batch_size: 1 mode: paired_voice_audio path: C:/Users/oldgu/ozen-toolkit/output/Vincent_AGraveyardofGhostTales.wav_2023_03_22-20_05\valid.txt fetcher_mode: ['lj'] phase: val max_wav_length: 255995 max_text_length: 200 sample_rate: 22050 load_conditioning: True num_conditioning_candidates: 2 conditioning_length: 44000 use_bpe_tokenizer: True load_aligned_codes: False data_type: img ] ] steps:[ gpt_train:[ training: gpt loss_log_buffer: 500 optimizer: adamw optimizer_params:[ lr: 1e-05 triton: False weight_decay: 0.01 beta1: 0.9 beta2: 0.96 ] clip_grad_eps: 4 injectors:[ paired_to_mel:[ type: torch_mel_spectrogram mel_norm_file: ../experiments/clips_mel_norms.pth in: wav out: paired_mel ] paired_cond_to_mel:[ type: for_each subtype: torch_mel_spectrogram mel_norm_file: ../experiments/clips_mel_norms.pth in: conditioning out: paired_conditioning_mel ] to_codes:[ type: discrete_token in: paired_mel out: paired_mel_codes dvae_config: ../experiments/train_diffusion_vocoder_22k_level.yml ] paired_fwd_text:[ type: generator generator: gpt in: ['paired_conditioning_mel', 'padded_text', 'text_lengths', 'paired_mel_codes', 'wav_lengths'] out: ['loss_text_ce', 'loss_mel_ce', 'logits'] ] ] losses:[ text_ce:[ type: direct weight: 0.01 key: loss_text_ce ] mel_ce:[ type: direct weight: 1 key: loss_mel_ce ] ] ] ] networks:[ gpt:[ type: generator which_model_G: unified_voice2 kwargs:[ layers: 30 model_dim: 1024 heads: 16 max_text_tokens: 402 max_mel_tokens: 604 max_conditioning_inputs: 2 mel_length_compression: 1024 number_text_tokens: 256 number_mel_codes: 8194 start_mel_token: 8192 stop_mel_token: 8193 start_text_token: 255 train_solo_embeddings: False use_mel_codes_as_input: True checkpointing: True ] ] ] path:[ pretrain_model_gpt: ../experiments/autoregressive.pth strict_load: True root: C:\Users\oldgu\DL-Art-School experiments_root: C:\Users\oldgu\DL-Art-School\experiments\Test models: C:\Users\oldgu\DL-Art-School\experiments\Test\models training_state: C:\Users\oldgu\DL-Art-School\experiments\Test\training_state log: C:\Users\oldgu\DL-Art-School\experiments\Test val_images: C:\Users\oldgu\DL-Art-School\experiments\Test\val_images ] train:[ niter: 200 warmup_iter: -1 mega_batch_factor: 1 val_freq: 500 default_lr_scheme: MultiStepLR gen_lr_steps: [100, 200, 280, 360] lr_gamma: 0.5 ema_enabled: False manual_seed: 1337 ] eval:[ output_state: gen injectors:[ gen_inj_eval:[ type: generator generator: generator in: hq out: ['gen', 'codebook_commitment_loss'] ] ] ] logger:[ print_freq: 10 save_checkpoint_freq: 10 visuals: ['gen', 'mel'] visual_debug_rate: 500 is_mel_spectrogram: True disable_state_saving: False ] upgrades:[ number_of_checkpoints_to_save: 0 number_of_states_to_save: 0 ] is_train: True dist: False

23-03-22 20:25:03.538 - INFO: Random seed: 1337 Traceback (most recent call last): File "C:\Users\oldgu\DL-Art-School\codes\train.py", line 398, in trainer.init(args.opt, opt, args.launcher) File "C:\Users\oldgu\DL-Art-School\codes\train.py", line 121, in init self.total_epochs = int(math.ceil(total_iters / train_size)) ZeroDivisionError: division by zero Press any key to continue . . .

152334H commented 1 year ago

means your dataset has 0 entries for some reason. check the train.txt/valid.txt files

demonauthor commented 1 year ago

They are blank. Something up with my Ozen build. Thanks!