ahmedHanzala / urdu-tts

An open source urdu/hindi text-to-speech system with voice cloning capabilities
MIT License
9 stars 2 forks source link

getting this error: self.tokenizer = Tokenizer.from_file(vocab_file) Exception: No such file or directory (os error 2) #1

Open mlkasim791 opened 1 year ago

mlkasim791 commented 1 year ago

i think vocab_file is missing and cannot be found. Here is the complete out put of the training cell of the training notebook [Errno 2] No such file or directory: 'codes' /content/gdrive/MyDrive/trainer/codes Disabled distributed training. Path already exists. Rename it to [/content/gdrive/MyDrive/trainer/experiments/gyapan_archived_230806-053843] 23-08-06 05:38:43.913 - INFO: name: gyapan model: extensibletrainer scale: 1 gpu_ids: [0] start_step: 0 checkpointing_enabled: True fp16: False use_8bit: True wandb: False use_tb_logger: True datasets:[ train:[ name: gyapan-clone n_workers: 8 batch_size: 66 mode: paired_voice_audio path: /content/gdrive/MyDrive/gyapan/train.txt fetcher_mode: ['lj'] phase: train max_wav_length: 255995 max_text_length: 200 sample_rate: 22050 load_conditioning: True num_conditioning_candidates: 2 conditioning_length: 44000 use_bpe_tokenizer: True load_aligned_codes: False data_type: img ] val:[ name: TestValidation n_workers: 1 batch_size: 33 mode: paired_voice_audio path: /content/gdrive/MyDrive/gyapan/val.txt fetcher_mode: ['lj'] phase: val max_wav_length: 255995 max_text_length: 200 sample_rate: 22050 load_conditioning: True num_conditioning_candidates: 2 conditioning_length: 44000 use_bpe_tokenizer: True load_aligned_codes: False data_type: img ] ] steps:[ gpt_train:[ training: gpt loss_log_buffer: 500 optimizer: adamw optimizer_params:[ lr: 1e-05 triton: False weight_decay: 0.01 beta1: 0.9 beta2: 0.96 ] clip_grad_eps: 4 injectors:[ paired_to_mel:[ type: torch_mel_spectrogram mel_norm_file: ../experiments/clips_mel_norms.pth in: wav out: paired_mel ] paired_cond_to_mel:[ type: for_each subtype: torch_mel_spectrogram mel_norm_file: ../experiments/clips_mel_norms.pth in: conditioning out: paired_conditioning_mel ] to_codes:[ type: discrete_token in: paired_mel out: paired_mel_codes dvae_config: ../experiments/train_diffusion_vocoder_22k_level.yml ] paired_fwd_text:[ type: generator generator: gpt in: ['paired_conditioning_mel', 'padded_text', 'text_lengths', 'paired_mel_codes', 'wav_lengths'] out: ['loss_text_ce', 'loss_mel_ce', 'logits'] ] ] losses:[ text_ce:[ type: direct weight: 0.01 key: loss_text_ce ] mel_ce:[ type: direct weight: 1 key: loss_mel_ce ] ] ] ] networks:[ gpt:[ type: generator which_model_G: unified_voice2 kwargs:[ layers: 30 model_dim: 1024 heads: 16 max_text_tokens: 402 max_mel_tokens: 604 max_conditioning_inputs: 2 mel_length_compression: 1024 number_text_tokens: 256 number_mel_codes: 8194 start_mel_token: 8192 stop_mel_token: 8193 start_text_token: 255 train_solo_embeddings: False use_mel_codes_as_input: True checkpointing: True tortoise_compat: True ] ] ] path:[ pretrain_model_gpt: ../experiments/autoregressive.pth strict_load: True root: /content/gdrive/MyDrive/trainer experiments_root: /content/gdrive/MyDrive/trainer/experiments/gyapan models: /content/gdrive/MyDrive/trainer/experiments/gyapan/models training_state: /content/gdrive/MyDrive/trainer/experiments/gyapan/training_state log: /content/gdrive/MyDrive/trainer/experiments/gyapan val_images: /content/gdrive/MyDrive/trainer/experiments/gyapan/val_images ] train:[ niter: 50000 warmup_iter: -1 mega_batch_factor: 4 val_freq: 60 default_lr_scheme: MultiStepLR gen_lr_steps: [200, 400, 560, 720] lr_gamma: 0.5 ema_enabled: False ] eval:[ pure: True ] logger:[ print_freq: 20 save_checkpoint_freq: 60 visuals: ['gen', 'mel'] visual_debug_rate: 500 is_mel_spectrogram: True disable_state_saving: False ] upgrades:[ number_of_checkpoints_to_save: 1 number_of_states_to_save: 0 ] is_train: True dist: False

2023-08-06 05:38:45.229937: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 23-08-06 05:38:46.228 - INFO: Random seed: 9361 Traceback (most recent call last): File "/content/gdrive/MyDrive/trainer/codes/train.py", line 398, in trainer.init(args.opt, opt, args.launcher) File "/content/gdrive/MyDrive/trainer/codes/train.py", line 115, in init self.train_set, collate_fn = create_dataset(dataset_opt, return_collate=True) File "/content/gdrive/MyDrive/trainer/codes/data/init.py", line 107, in create_dataset dataset = D(dataset_opt) File "/content/gdrive/MyDrive/trainer/codes/data/audio/paired_voice_audio_dataset.py", line 169, in init self.tokenizer = VoiceBpeTokenizer(opt_get(hparams, ['tokenizer_vocab'], '../experiments/bpe_lowercase_asr_256.json')) File "/content/gdrive/MyDrive/trainer/codes/data/audio/voice_tokenizer.py", line 34, in init self.tokenizer = Tokenizer.from_file(vocab_file) Exception: No such file or directory (os error 2)

the question is, how can i get this file?