enhuiz / vall-e

An unofficial PyTorch implementation of the audio LM VALL-E
MIT License
2.93k stars 417 forks source link

ValueError: Failed to find another different utterance for test. #69

Open NaeemKhanNiazi opened 1 year ago

NaeemKhanNiazi commented 1 year ago

Hi Guys , I am facing following error. I am training model on custom prepared data

  "git_commit": "3476d393d2133fa9b50d5ad999ca13b95fc22060",
  "git_status": "On branch main\nYour branch is up to date with 'origin/main'.\n\nChanges not staged for commit:\n  (use \"git add/rm <file>...\" to update what will be committed)\n  (use \"git restore <file>...\" to discard changes in working directory)\n\tmodified:   config/test/ar.yml\n\tmodified:   config/test/nar.yml\n\tmodified:   data/test/test.normalized.txt\n\tmodified:   data/test/test.phn.txt\n\tmodified:   data/test/test.qnt.pt\n\tmodified:   data/test/test.wav\n\tdeleted:    data/test/test2.phn.txt\n\tdeleted:    data/test/test2.qnt.pt\n\tmodified:   scripts/plot.py\n\tmodified:   scripts/run.sh\n\tmodified:   vall_e/config.py\n\tmodified:   vall_e/data.py\n\tmodified:   vall_e/train.py\n\nUntracked files:\n  (use \"git add <file>...\" to include in what will be committed)\n\ttoy.wav\n\tzoo/\n\nno changes added to commit (use \"git add\" and/or \"git commit -a\")",
  "gradient_accumulation_steps": 1,
  "gradient_clipping": 100.0,
  "log_dir": "logs/your_data/ar/1678880924",
  "log_root": "logs",
  "max_grad_norm": null,
  "max_iter": 1000,
  "max_num_val": 20,
  "max_phones": 50000,
  "max_prompts": 3,
  "max_val_ar_steps": 300,
  "min_phones": 10,
  "model": "ar-quarter",
  "nj": 8,
  "num_tokens": 1024,
  "p_additional_prompt": 0.8,
  "relpath": "your_data/ar",
  "sample_rate": 24000,
  "sampling_temperature": 1.0,
  "save_artifacts_every": 100,
  "save_ckpt_every": 500,
  "save_on_oom": true,
  "save_on_quit": true,
  "spkr_name_getter": "lambda p: p.parts[-2]",
  "start_time": 1678880924,
  "token_dim": 256,
  "use_fp16": true,
  "warmup_max_lr": 0.0002,
  "warmup_min_lr": 1e-06,
  "warmup_num_steps": 1000
}
2023-03-15 11:48:46 - vall_e.utils.trainer - INFO - GR=0;LR=0 - 
New epoch starts.
/usr/local/lib/python3.10/site-packages/torch/utils/data/dataloader.py:554: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/content/drive/MyDrive/Colab Notebooks/VALL_E/vall-e/vall_e/train.py", line 129, in <module>
    main()
  File "/content/drive/MyDrive/Colab Notebooks/VALL_E/vall-e/vall_e/train.py", line 120, in main
    trainer.train(
  File "/content/drive/MyDrive/Colab Notebooks/VALL_E/vall-e/vall_e/utils/trainer.py", line 150, in train
    for batch in _make_infinite_epochs(train_dl):
  File "/content/drive/MyDrive/Colab Notebooks/VALL_E/vall-e/vall_e/utils/trainer.py", line 103, in _make_infinite_epochs
    yield from dl
  File "/usr/local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 628, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1333, in _next_data
    return self._process_data(data)
  File "/usr/local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1359, in _process_data
    data.reraise()
  File "/usr/local/lib/python3.10/site-packages/torch/_utils.py", line 543, in reraise
    raise exception
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 58, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 58, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/content/drive/MyDrive/Colab Notebooks/VALL_E/vall-e/vall_e/data.py", line 179, in __getitem__
    proms = self.sample_prompts(spkr_name, ignore=path)
  File "/content/drive/MyDrive/Colab Notebooks/VALL_E/vall-e/vall_e/data.py", line 156, in sample_prompts
    raise ValueError(
ValueError: Failed to find another different utterance for test.
cantabile-kwok commented 1 year ago

It seems that you have a speaker with only one utterance. This is not allowed in this code.

NaeemKhanNiazi commented 1 year ago

@cantabile-kwok Thanks for reply. Can you explain mean what is mean by utterance . As per my understanding

"One utterance means one sentence. Isn't it "

But in the given colab note book Colab-NoteBook

It has hello world example. How should I prepare custom dataset. Can you give some help .Thanks

cantabile-kwok commented 1 year ago

You should make a directory to store wavs, and specify it as data_dirs in the config yaml. You should also put the corresponding of each transcription (.normalized.txt) file in it which has the same basename with that .wav file. Then call python -m vall_e.emb.qnt and python -m vall_e.emb.g2p to preprocess the data, as described in that Colab notebook.

You can try to put another wav and txt into data/test and run the notebook to see what's the difference.

NaeemKhanNiazi commented 1 year ago

@cantabile-kwok

I added two '.wav file' and corresponding 'test.normalized.txt' file. The error "ValueError: Failed to find another different utterance for test." is removed. But when I run the command

!python -m vall_e.train yaml=config/test/ar.yml
/content/drive/MyDrive/Colab Notebooks/VALL_E/vall-e
logs/test/nar/1679003729
CFG [PosixPath('data/test')]
DRRRRRRRRRRRRR data/test
/content/drive/MyDrive/Colab Notebooks/VALL_E/vall-e
2it [00:00, 1069.29it/s]
Pathssssssssss [PosixPath('data/test/test.qnt.pt'), PosixPath('data/test/imran.qnt.pt')]
Pairssssssssssssss [('test', PosixPath('data/test/imran.qnt.pt')), ('test', PosixPath('data/test/test.qnt.pt'))]
train_paths [PosixPath('data/test/imran.qnt.pt'), PosixPath('data/test/test.qnt.pt')]
val_paths [PosixPath('data/test/imran.qnt.pt'), PosixPath('data/test/test.qnt.pt')]
5
50000
****************************
[PosixPath('data/test/imran.qnt.pt'), PosixPath('data/test/test.qnt.pt')]
##############################
True
##############################
True
ddddddddddddddddddddddddddddd
[PosixPath('data/test/imran.qnt.pt'), PosixPath('data/test/test.qnt.pt')]
True
PATTTTTTTTTTTTTTTTTTTTt
[PosixPath('data/test/imran.qnt.pt'), PosixPath('data/test/test.qnt.pt')]
{'test': [PosixPath('data/test/imran.qnt.pt'), PosixPath('data/test/test.qnt.pt')]}
data/test/imran.qnt.pt
test
2
data/test/test.qnt.pt
test
2
[PosixPath('data/test/imran.qnt.pt'), PosixPath('data/test/test.qnt.pt')]
5
50000
****************************
[PosixPath('data/test/imran.qnt.pt'), PosixPath('data/test/test.qnt.pt')]
##############################
True
##############################
True
ddddddddddddddddddddddddddddd
[PosixPath('data/test/imran.qnt.pt'), PosixPath('data/test/test.qnt.pt')]
False
PATTTTTTTTTTTTTTTTTTTTt
[PosixPath('data/test/imran.qnt.pt'), PosixPath('data/test/test.qnt.pt')]
{'test': [PosixPath('data/test/imran.qnt.pt'), PosixPath('data/test/test.qnt.pt'), PosixPath('data/test/imran.qnt.pt'), PosixPath('data/test/test.qnt.pt')]}
data/test/imran.qnt.pt
test
4
data/test/test.qnt.pt
test
4
[PosixPath('data/test/imran.qnt.pt'), PosixPath('data/test/test.qnt.pt')]
/usr/local/lib/python3.10/site-packages/torch/utils/data/dataloader.py:561: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(
2023-03-16 21:55:29 - vall_e.data - INFO - GR=0;LR=0 - 
{'</s>': 1, '<s>': 2, 'AA1': 3, 'AE1': 4, 'AE2': 5, 'AH0': 6, 'AY1': 7, 'EY1': 8, 'IH0': 9, 'IH1': 10, 'IY1': 11, 'K': 12, 'L': 13, 'M': 14, 'N': 15, 'P': 16, 'R': 17, 'S': 18, 'T': 19, 'V': 20, 'Z': 21, '_': 22}
2023-03-16 21:55:29 - vall_e.data - INFO - GR=0;LR=0 - 
{'test': 0}
2023-03-16 21:55:29 - vall_e.data - INFO - GR=0;LR=0 - 
#samples (train): 2.
2023-03-16 21:55:29 - vall_e.data - INFO - GR=0;LR=0 - 
#samples (val): 2.
[2023-03-16 21:55:30,061] [INFO] [comm.py:661:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
2023-03-16 21:55:30 - torch.distributed.distributed_c10d - INFO - GR=0;LR=0 - 
Added key: store_based_barrier_key:1 to store for rank: 0
2023-03-16 21:55:30 - torch.distributed.distributed_c10d - INFO - GR=0;LR=0 - 
Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
2023-03-16 21:55:32 - torch.distributed.distributed_c10d - INFO - GR=0;LR=0 - 
Added key: store_based_barrier_key:2 to store for rank: 0
2023-03-16 21:55:32 - torch.distributed.distributed_c10d - INFO - GR=0;LR=0 - 
Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 1 nodes.
[2023-03-16 21:55:32,718] [INFO] [logging.py:77:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
Installed CUDA version 11.8 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /root/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /root/.cache/torch_extensions/py310_cu117/fused_adam/build.ninja...
Building extension module fused_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_adam...
Time to load fused_adam op: 0.11565709114074707 seconds
[2023-03-16 21:55:33,713] [INFO] [logging.py:77:log_dist] [Rank 0] Using DeepSpeed Optimizer param name adam as basic optimizer
[2023-03-16 21:55:33,720] [INFO] [logging.py:77:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam
[2023-03-16 21:55:33,721] [INFO] [logging.py:77:log_dist] [Rank 0] Creating fp16 optimizer with dynamic loss scale
[2023-03-16 21:55:33,731] [INFO] [logging.py:77:log_dist] [Rank 0] DeepSpeed Final Optimizer = adam
[2023-03-16 21:55:33,731] [INFO] [logging.py:77:log_dist] [Rank 0] DeepSpeed using configured LR scheduler = WarmupDecayLR
[2023-03-16 21:55:33,731] [INFO] [logging.py:77:log_dist] [Rank 0] DeepSpeed LR Scheduler = <deepspeed.runtime.lr_schedules.WarmupDecayLR object at 0x7fe7d1d56620>
[2023-03-16 21:55:33,731] [INFO] [logging.py:77:log_dist] [Rank 0] step=0, skipped=0, lr=[0.001], mom=[(0.9, 0.999)]
[2023-03-16 21:55:33,732] [INFO] [config.py:1010:print] DeepSpeedEngine configuration:
[2023-03-16 21:55:33,732] [INFO] [config.py:1014:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2023-03-16 21:55:33,732] [INFO] [config.py:1014:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2023-03-16 21:55:33,732] [INFO] [config.py:1014:print]   amp_enabled .................. False
[2023-03-16 21:55:33,732] [INFO] [config.py:1014:print]   amp_params ................... False
[2023-03-16 21:55:33,733] [INFO] [config.py:1014:print]   autotuning_config ............ {
    "enabled": false, 
    "start_step": null, 
    "end_step": null, 
    "metric_path": null, 
    "arg_mappings": null, 
    "metric": "throughput", 
    "model_info": null, 
    "results_dir": "autotuning_results", 
    "exps_dir": "autotuning_exps", 
    "overwrite": true, 
    "fast": true, 
    "start_profile_step": 3, 
    "end_profile_step": 5, 
    "tuner_type": "gridsearch", 
    "tuner_early_stopping": 5, 
    "tuner_num_trials": 50, 
    "model_info_path": null, 
    "mp_size": 1, 
    "max_train_batch_size": null, 
    "min_train_batch_size": 1, 
    "max_train_micro_batch_size_per_gpu": 1.024000e+03, 
    "min_train_micro_batch_size_per_gpu": 1, 
    "num_tuning_micro_batch_sizes": 3
}
[2023-03-16 21:55:33,733] [INFO] [config.py:1014:print]   bfloat16_enabled ............. False
[2023-03-16 21:55:33,733] [INFO] [config.py:1014:print]   checkpoint_parallel_write_pipeline  False
[2023-03-16 21:55:33,733] [INFO] [config.py:1014:print]   checkpoint_tag_validation_enabled  True
[2023-03-16 21:55:33,733] [INFO] [config.py:1014:print]   checkpoint_tag_validation_fail  False
[2023-03-16 21:55:33,733] [INFO] [config.py:1014:print]   comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7fe7d1d55fc0>
[2023-03-16 21:55:33,733] [INFO] [config.py:1014:print]   communication_data_type ...... None
[2023-03-16 21:55:33,733] [INFO] [config.py:1014:print]   compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2023-03-16 21:55:33,733] [INFO] [config.py:1014:print]   curriculum_enabled_legacy .... False
[2023-03-16 21:55:33,733] [INFO] [config.py:1014:print]   curriculum_params_legacy ..... False
[2023-03-16 21:55:33,733] [INFO] [config.py:1014:print]   data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2023-03-16 21:55:33,733] [INFO] [config.py:1014:print]   data_efficiency_enabled ...... False
[2023-03-16 21:55:33,733] [INFO] [config.py:1014:print]   dataloader_drop_last ......... False
[2023-03-16 21:55:33,734] [INFO] [config.py:1014:print]   disable_allgather ............ False
[2023-03-16 21:55:33,734] [INFO] [config.py:1014:print]   dump_state ................... False
[2023-03-16 21:55:33,734] [INFO] [config.py:1014:print]   dynamic_loss_scale_args ...... None
[2023-03-16 21:55:33,734] [INFO] [config.py:1014:print]   eigenvalue_enabled ........... False
[2023-03-16 21:55:33,734] [INFO] [config.py:1014:print]   eigenvalue_gas_boundary_resolution  1
[2023-03-16 21:55:33,734] [INFO] [config.py:1014:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2023-03-16 21:55:33,734] [INFO] [config.py:1014:print]   eigenvalue_layer_num ......... 0
[2023-03-16 21:55:33,734] [INFO] [config.py:1014:print]   eigenvalue_max_iter .......... 100
[2023-03-16 21:55:33,734] [INFO] [config.py:1014:print]   eigenvalue_stability ......... 1e-06
[2023-03-16 21:55:33,734] [INFO] [config.py:1014:print]   eigenvalue_tol ............... 0.01
[2023-03-16 21:55:33,734] [INFO] [config.py:1014:print]   eigenvalue_verbose ........... False
[2023-03-16 21:55:33,734] [INFO] [config.py:1014:print]   elasticity_enabled ........... False
[2023-03-16 21:55:33,734] [INFO] [config.py:1014:print]   flops_profiler_config ........ {
    "enabled": false, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2023-03-16 21:55:33,734] [INFO] [config.py:1014:print]   fp16_auto_cast ............... False
[2023-03-16 21:55:33,734] [INFO] [config.py:1014:print]   fp16_enabled ................. True
[2023-03-16 21:55:33,734] [INFO] [config.py:1014:print]   fp16_master_weights_and_gradients  False
[2023-03-16 21:55:33,734] [INFO] [config.py:1014:print]   global_rank .................. 0
[2023-03-16 21:55:33,734] [INFO] [config.py:1014:print]   grad_accum_dtype ............. None
[2023-03-16 21:55:33,734] [INFO] [config.py:1014:print]   gradient_accumulation_steps .. 1
[2023-03-16 21:55:33,734] [INFO] [config.py:1014:print]   gradient_clipping ............ 100.0
[2023-03-16 21:55:33,735] [INFO] [config.py:1014:print]   gradient_predivide_factor .... 1.0
[2023-03-16 21:55:33,735] [INFO] [config.py:1014:print]   initial_dynamic_scale ........ 65536
[2023-03-16 21:55:33,735] [INFO] [config.py:1014:print]   load_universal_checkpoint .... False
[2023-03-16 21:55:33,735] [INFO] [config.py:1014:print]   loss_scale ................... 0
[2023-03-16 21:55:33,735] [INFO] [config.py:1014:print]   memory_breakdown ............. False
[2023-03-16 21:55:33,735] [INFO] [config.py:1014:print]   monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
[2023-03-16 21:55:33,735] [INFO] [config.py:1014:print]   nebula_config ................ {
    "enabled": false, 
    "persistent_storage_path": null, 
    "persistent_time_interval": 100, 
    "num_of_version_in_retention": 2, 
    "enable_nebula_load": true, 
    "load_path": null
}
[2023-03-16 21:55:33,735] [INFO] [config.py:1014:print]   optimizer_legacy_fusion ...... False
[2023-03-16 21:55:33,735] [INFO] [config.py:1014:print]   optimizer_name ............... adam
[2023-03-16 21:55:33,735] [INFO] [config.py:1014:print]   optimizer_params ............. None
[2023-03-16 21:55:33,735] [INFO] [config.py:1014:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2023-03-16 21:55:33,735] [INFO] [config.py:1014:print]   pld_enabled .................. False
[2023-03-16 21:55:33,735] [INFO] [config.py:1014:print]   pld_params ................... False
[2023-03-16 21:55:33,735] [INFO] [config.py:1014:print]   prescale_gradients ........... False
[2023-03-16 21:55:33,735] [INFO] [config.py:1014:print]   scheduler_name ............... WarmupDecayLR
[2023-03-16 21:55:33,735] [INFO] [config.py:1014:print]   scheduler_params ............. {'warmup_min_lr': 1e-06, 'warmup_max_lr': 0.0002, 'warmup_num_steps': 1000, 'total_num_steps': 1000, 'warmup_type': 'linear'}
[2023-03-16 21:55:33,735] [INFO] [config.py:1014:print]   sparse_attention ............. None
[2023-03-16 21:55:33,735] [INFO] [config.py:1014:print]   sparse_gradients_enabled ..... False
[2023-03-16 21:55:33,735] [INFO] [config.py:1014:print]   steps_per_print .............. 10
[2023-03-16 21:55:33,736] [INFO] [config.py:1014:print]   train_batch_size ............. 1
[2023-03-16 21:55:33,736] [INFO] [config.py:1014:print]   train_micro_batch_size_per_gpu  1
[2023-03-16 21:55:33,736] [INFO] [config.py:1014:print]   use_node_local_storage ....... False
[2023-03-16 21:55:33,736] [INFO] [config.py:1014:print]   wall_clock_breakdown ......... False
[2023-03-16 21:55:33,736] [INFO] [config.py:1014:print]   world_size ................... 1
[2023-03-16 21:55:33,736] [INFO] [config.py:1014:print]   zero_allow_untested_optimizer  False
[2023-03-16 21:55:33,736] [INFO] [config.py:1014:print]   zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False
[2023-03-16 21:55:33,736] [INFO] [config.py:1014:print]   zero_enabled ................. False
[2023-03-16 21:55:33,736] [INFO] [config.py:1014:print]   zero_optimization_stage ...... 0
[2023-03-16 21:55:33,736] [INFO] [config.py:999:print_user_config]   json = {
    "train_micro_batch_size_per_gpu": 1, 
    "gradient_accumulation_steps": 1, 
    "optimizer": {
        "type": "Adam", 
        "lr": 1e-06
    }, 
    "scheduler": {
        "type": "WarmupDecayLR", 
        "params": {
            "warmup_min_lr": 1e-06, 
            "warmup_max_lr": 0.0002, 
            "warmup_num_steps": 1000, 
            "total_num_steps": 1000, 
            "warmup_type": "linear"
        }
    }, 
    "gradient_clipping": 100.0, 
    "fp16": {
        "enabled": true
    }
}
Using /root/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Emitting ninja build file /root/.cache/torch_extensions/py310_cu117/utils/build.ninja...
Building extension module utils...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module utils...
Time to load utils op: 0.11644482612609863 seconds
[2023-03-16 21:55:34,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loading checkpoint from ckpts/test/nar/model/default/mp_rank_00_model_states.pt...
[2023-03-16 21:55:35,888] [INFO] [torch_checkpoint_engine.py:25:load] [Torch] Loaded checkpoint from ckpts/test/nar/model/default/mp_rank_00_model_states.pt.
[2023-03-16 21:55:35,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loading checkpoint from ckpts/test/nar/model/default/mp_rank_00_model_states.pt...
[2023-03-16 21:55:36,345] [INFO] [torch_checkpoint_engine.py:25:load] [Torch] Loaded checkpoint from ckpts/test/nar/model/default/mp_rank_00_model_states.pt.
2023-03-16 21:55:36 - vall_e.utils.trainer - INFO - GR=0;LR=0 - 
{
  "batch_size": 1,
  "cache_dataloader": false,
  "cache_dir": ".cache/test/nar",
  "cfg_name": "test/nar",
  "cfg_relpath": null,
  "ckpt_dir": "ckpts/test/nar",
  "ckpt_root": "ckpts",
  "data_dirs": "[PosixPath('data/test')]",
  "data_root": "data",
  "device": "cuda",
  "dis_warmup_max_lr": 0.0004,
  "ds_cfg": {
    "train_micro_batch_size_per_gpu": 1,
    "gradient_accumulation_steps": 1,
    "optimizer": {
      "type": "Adam",
      "lr": 1e-06
    },
    "scheduler": {
      "type": "WarmupDecayLR",
      "params": {
        "warmup_min_lr": 1e-06,
        "warmup_max_lr": 0.0002,
        "warmup_num_steps": 1000,
        "total_num_steps": 1000,
        "warmup_type": "linear"
      }
    },
    "gradient_clipping": 100.0,
    "fp16": {
      "enabled": true
    }
  },
  "eval_batch_size": 1,
  "eval_every": 500,
  "fp16_cfg": {
    "enabled": true
  },
  "git_commit": "3476d393d2133fa9b50d5ad999ca13b95fc22060",
  "git_status": "On branch main\nYour branch is up to date with 'origin/main'.\n\nChanges not staged for commit:\n  (use \"git add/rm <file>...\" to update what will be committed)\n  (use \"git restore <file>...\" to discard changes in working directory)\n\tmodified:   data/test/test.normalized.txt\n\tmodified:   data/test/test.phn.txt\n\tmodified:   data/test/test.qnt.pt\n\tmodified:   data/test/test.wav\n\tdeleted:    data/test/test2.phn.txt\n\tdeleted:    data/test/test2.qnt.pt\n\tmodified:   scripts/plot.py\n\tmodified:   scripts/run.sh\n\tmodified:   vall_e/config.py\n\tmodified:   vall_e/data.py\n\tmodified:   vall_e/train.py\n\nUntracked files:\n  (use \"git add <file>...\" to include in what will be committed)\n\ttest/\n\ttoy.wav\n\tzoo/\n\nno changes added to commit (use \"git add\" and/or \"git commit -a\")",
  "gradient_accumulation_steps": 1,
  "gradient_clipping": 100.0,
  "log_dir": "logs/test/nar/1679003729",
  "log_root": "logs",
  "max_grad_norm": null,
  "max_iter": 1000,
  "max_num_val": 20,
  "max_phones": 50000,
  "max_prompts": 3,
  "max_val_ar_steps": 300,
  "min_phones": 5,
  "model": "nar-quarter",
  "nj": 8,
  "num_tokens": 1024,
  "p_additional_prompt": 0.8,
  "relpath": "test/nar",
  "sample_rate": 24000,
  "sampling_temperature": 1.0,
  "save_artifacts_every": 100,
  "save_ckpt_every": 500,
  "save_on_oom": true,
  "save_on_quit": true,
  "spkr_name_getter": "lambda p: p.parts[-2]",
  "start_time": 1679003729,
  "token_dim": 256,
  "use_fp16": true,
  "warmup_max_lr": 0.0002,
  "warmup_min_lr": 1e-06,
  "warmup_num_steps": 1000
}
2023-03-16 21:55:36 - vall_e.utils.trainer - INFO - GR=0;LR=0 - 
New epoch starts.
/usr/local/lib/python3.10/site-packages/torch/utils/data/dataloader.py:561: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(

Training cannot start. Can you me any help. Thanks

cantabile-kwok commented 1 year ago

Based on the output I think the training has already started. I have encountered the same situation in #58 and it was because the saved model at ckpts/test/nar/model/default/mp_rank_00_model_states.pt is already at the maximum step. You can delete the checkpoints and give it another try.

NaeemKhanNiazi commented 1 year ago

@cantabile-kwok I started the training by adding multiple sample . But training get stuck

New epoch starts.
2023-03-18 02:04:18 - vall_e.utils.trainer - INFO - GR=0;LR=0 - 
New epoch starts.
2023-03-18 02:04:18 - vall_e.utils.trainer - INFO - GR=0;LR=0 - 
New epoch starts.
2023-03-18 02:04:18 - vall_e.utils.trainer - INFO - GR=0;LR=0 - 
New epoch starts.
2023-03-18 02:04:18 - vall_e.utils.trainer - INFO - GR=0;LR=0 - 
New epoch starts.
2023-03-18 02:04:18 - vall_e.utils.trainer - INFO - GR=0;LR=0 - 
New epoch starts.
2023-03-18 02:04:18 - vall_e.utils.trainer - INFO - GR=0;LR=0 - 
New epoch starts.
2023-03-18 02:04:18 - vall_e.utils.trainer - INFO - GR=0;LR=0 - 
New epoch starts.
2023-03-18 02:04:18 - vall_e.utils.trainer - INFO - GR=0;LR=0 - 
New epoch starts.
2023-03-18 02:04:18 - vall_e.utils.trainer - INFO - GR=0;LR=0 - 
New epoch starts.
2023-03-18 02:04:18 - vall_e.utils.trainer - INFO - GR=0;LR=0 - 
New epoch starts.
2023-03-18 02:04:18 - vall_e.utils.trainer - INFO - GR=0;LR=0 - 
New epoch starts.
2023-03-18 02:04:18 - vall_e.utils.trainer - INFO - GR=0;LR=0 - 
New epoch starts.
2023-03-18 02:04:18 - vall_e.utils.trainer - INFO - GR=0;LR=0 - 
New epoch starts.
2023-03-18 02:04:18 - vall_e.utils.trainer - INFO - GR=0;LR=0 - 
New epoch starts.
2023-03-18 02:04:18 - vall_e.utils.trainer - INFO - GR=0;LR=0 - 
NaeemKhanNiazi commented 1 year ago

Hi @cantabile-kwok , I am facing the follow

!python -m vall_e 'He is a good person.' data/test/sampleTwo.wav sec.wav
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/content/drive/MyDrive/Colab Notebooks/VALL_E/vall-e/vall_e/__main__.py", line 43, in <module>
    main()
  File "/content/drive/MyDrive/Colab Notebooks/VALL_E/vall-e/vall_e/__main__.py", line 29, in main
    phns = torch.tensor([symmap[p] for p in g2p.encode(args.text)])
  File "/content/drive/MyDrive/Colab Notebooks/VALL_E/vall-e/vall_e/__main__.py", line 29, in <listcomp>
    phns = torch.tensor([symmap[p] for p in g2p.encode(args.text)])
KeyError: 'HH'
cantabile-kwok commented 1 year ago

Probably your training data does not contain phoneme "HH". The model cannot speak a phoneme that it has not seen, also it is basically not even registered in the symbol map.

By the way, if you are just training on a few samples, it is not likely for the model to generate customized sentence, as it overfits to the few training samples.

NaeemKhanNiazi commented 1 year ago

@cantabile-kwok Thank you for reply. I am confused with the following command .Can you elobrate it for me

python -m vall_e <text> <ref_path> <out_path> --ar-ckpt zoo/ar.pt --nar-ckpt zoo/nar.pt

why ref audio path ?

!python -m vall_e 'hello world' data/test/test.wav toy.wav

It is creating confusion for me .

when ever I run this command

!python -m vall_e 'hello science' data/test/test.wav toy.wav

it save the audio of test.wav into toy.wav.Why it does not generate and save the audio of 'hello science' as audio into toy.wav? Can you explain it for me ,Thanks

cantabile-kwok commented 1 year ago

The reference audio is to provide speaker reference that the model intends to copy from. This is the target speaker in zero-shot TTS scenario.

I guess you only used a few samples to train the model. If this is the case, then the model is not actually well-trained as I have mentioned. To achieve a relatively feasible performance, you have to use a considerable amount of training data.

NaeemKhanNiazi commented 1 year ago

@cantabile-kwok what should be length of each audio in training sample?

cantabile-kwok commented 1 year ago

It depends but usually around something like 8s and usually no longer than 20s. Also OK for short audio.

NaeemKhanNiazi commented 1 year ago

what is mean by 3 s audio and generate the cloned voice?

cantabile-kwok commented 1 year ago

The paper seems to cut the audio prompt (target speaker's reference speech) to 3s and feed the model. But this implementation seems not to consider this.

NaeemKhanNiazi commented 1 year ago

@cantabile-kwok Should we need to keep the audios of the same person in the training data which voice is going to be clone in the testing?

cantabile-kwok commented 1 year ago

In the training process we of course have to use the same person's speech as acoustic prompt, to let the model learn speaker timbre from this prompt. In testing stage typically we shouldn't use a speaker that was seen in training, because this is some kind of cheating. So some speakers have to be left out for testing.

NaeemKhanNiazi commented 1 year ago

@cantabile-kwok I have used 10 sample of my voice for training. But when I tried to generate the text (different text not the same in the training sample) it is not cloning the text in testing and generating the noise. what I should do.

cantabile-kwok commented 1 year ago

On which data have you been training

NaeemKhanNiazi commented 1 year ago

@cantabile-kwok I created my own audios (recorded with micro phone ) on some English sentences which use in daily life

cantabile-kwok commented 1 year ago

I have to say the training data size is way too small for such a large model to train. Usually these neural networks need dozens of hours of speech to work well, so making up own dataset is not likely to work

NaeemKhanNiazi commented 1 year ago

@cantabile-kwok So How I clone my own voice? If I train the model on LibriSST dataset, will it work for my voice for cloning?

cantabile-kwok commented 1 year ago

In this repo no pretrained model is released nor anyone can train a successful model, so if you just want to clone your own voice instead of training it, the best option may be to seek another method...

NaeemKhanNiazi commented 1 year ago

Another method means any algorithm or GitHub repository.

cantabile-kwok commented 1 year ago

I didn't see the difference. You may look for another algorithm and find the Github repo of corresponding implementation, or find an off-the-shelf tool online. I'm not sure.

NaeemKhanNiazi commented 1 year ago

@cantabile-kwok This blog is saying you can clone your own voice .|

Link : https://blog.paperspace.com/training-vall-e-from-scratch-on-your-own-voice-samples/

cantabile-kwok commented 1 year ago

It still needs to train the model. LibriLight is a very large dataset and you cannot effectively train it unless you have a mass amount of computation resources. If you use a smaller (yet not so small) dataset you still need to train the model. Anyway, you will need to train the model from scratch as there is no pretrained models publicly.

NaeemKhanNiazi commented 1 year ago

@cantabile-kwok So it mean , cannot generate the audio with own voice ? But As per paper

VALL-E emerges in-context learning capabilities and can be used to synthesize high-quality personalized speech with only a 3-second enrolled recording of an unseen speaker as an acoustic prompt.
cantabile-kwok commented 1 year ago

The paper is true but they must provide their trained model, otherwise you have nothing to clone your own voice and still need to spend time and effort training your own model.