Open NaeemKhanNiazi opened 1 year ago
It seems that you have a speaker with only one utterance. This is not allowed in this code.
@cantabile-kwok Thanks for reply. Can you explain mean what is mean by utterance . As per my understanding
"One utterance means one sentence. Isn't it "
But in the given colab note book Colab-NoteBook
It has hello world example. How should I prepare custom dataset. Can you give some help .Thanks
You should make a directory to store wavs, and specify it as data_dirs
in the config yaml. You should also put the corresponding of each transcription (.normalized.txt) file in it which has the same basename with that .wav file. Then call python -m vall_e.emb.qnt
and python -m vall_e.emb.g2p
to preprocess the data, as described in that Colab notebook.
You can try to put another wav and txt into data/test
and run the notebook to see what's the difference.
@cantabile-kwok
I added two '.wav file' and corresponding 'test.normalized.txt' file. The error "ValueError: Failed to find another different utterance for test." is removed. But when I run the command
!python -m vall_e.train yaml=config/test/ar.yml
/content/drive/MyDrive/Colab Notebooks/VALL_E/vall-e
logs/test/nar/1679003729
CFG [PosixPath('data/test')]
DRRRRRRRRRRRRR data/test
/content/drive/MyDrive/Colab Notebooks/VALL_E/vall-e
2it [00:00, 1069.29it/s]
Pathssssssssss [PosixPath('data/test/test.qnt.pt'), PosixPath('data/test/imran.qnt.pt')]
Pairssssssssssssss [('test', PosixPath('data/test/imran.qnt.pt')), ('test', PosixPath('data/test/test.qnt.pt'))]
train_paths [PosixPath('data/test/imran.qnt.pt'), PosixPath('data/test/test.qnt.pt')]
val_paths [PosixPath('data/test/imran.qnt.pt'), PosixPath('data/test/test.qnt.pt')]
5
50000
****************************
[PosixPath('data/test/imran.qnt.pt'), PosixPath('data/test/test.qnt.pt')]
##############################
True
##############################
True
ddddddddddddddddddddddddddddd
[PosixPath('data/test/imran.qnt.pt'), PosixPath('data/test/test.qnt.pt')]
True
PATTTTTTTTTTTTTTTTTTTTt
[PosixPath('data/test/imran.qnt.pt'), PosixPath('data/test/test.qnt.pt')]
{'test': [PosixPath('data/test/imran.qnt.pt'), PosixPath('data/test/test.qnt.pt')]}
data/test/imran.qnt.pt
test
2
data/test/test.qnt.pt
test
2
[PosixPath('data/test/imran.qnt.pt'), PosixPath('data/test/test.qnt.pt')]
5
50000
****************************
[PosixPath('data/test/imran.qnt.pt'), PosixPath('data/test/test.qnt.pt')]
##############################
True
##############################
True
ddddddddddddddddddddddddddddd
[PosixPath('data/test/imran.qnt.pt'), PosixPath('data/test/test.qnt.pt')]
False
PATTTTTTTTTTTTTTTTTTTTt
[PosixPath('data/test/imran.qnt.pt'), PosixPath('data/test/test.qnt.pt')]
{'test': [PosixPath('data/test/imran.qnt.pt'), PosixPath('data/test/test.qnt.pt'), PosixPath('data/test/imran.qnt.pt'), PosixPath('data/test/test.qnt.pt')]}
data/test/imran.qnt.pt
test
4
data/test/test.qnt.pt
test
4
[PosixPath('data/test/imran.qnt.pt'), PosixPath('data/test/test.qnt.pt')]
/usr/local/lib/python3.10/site-packages/torch/utils/data/dataloader.py:561: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(
2023-03-16 21:55:29 - vall_e.data - INFO - GR=0;LR=0 -
{'</s>': 1, '<s>': 2, 'AA1': 3, 'AE1': 4, 'AE2': 5, 'AH0': 6, 'AY1': 7, 'EY1': 8, 'IH0': 9, 'IH1': 10, 'IY1': 11, 'K': 12, 'L': 13, 'M': 14, 'N': 15, 'P': 16, 'R': 17, 'S': 18, 'T': 19, 'V': 20, 'Z': 21, '_': 22}
2023-03-16 21:55:29 - vall_e.data - INFO - GR=0;LR=0 -
{'test': 0}
2023-03-16 21:55:29 - vall_e.data - INFO - GR=0;LR=0 -
#samples (train): 2.
2023-03-16 21:55:29 - vall_e.data - INFO - GR=0;LR=0 -
#samples (val): 2.
[2023-03-16 21:55:30,061] [INFO] [comm.py:661:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
2023-03-16 21:55:30 - torch.distributed.distributed_c10d - INFO - GR=0;LR=0 -
Added key: store_based_barrier_key:1 to store for rank: 0
2023-03-16 21:55:30 - torch.distributed.distributed_c10d - INFO - GR=0;LR=0 -
Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
2023-03-16 21:55:32 - torch.distributed.distributed_c10d - INFO - GR=0;LR=0 -
Added key: store_based_barrier_key:2 to store for rank: 0
2023-03-16 21:55:32 - torch.distributed.distributed_c10d - INFO - GR=0;LR=0 -
Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 1 nodes.
[2023-03-16 21:55:32,718] [INFO] [logging.py:77:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
Installed CUDA version 11.8 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /root/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /root/.cache/torch_extensions/py310_cu117/fused_adam/build.ninja...
Building extension module fused_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_adam...
Time to load fused_adam op: 0.11565709114074707 seconds
[2023-03-16 21:55:33,713] [INFO] [logging.py:77:log_dist] [Rank 0] Using DeepSpeed Optimizer param name adam as basic optimizer
[2023-03-16 21:55:33,720] [INFO] [logging.py:77:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam
[2023-03-16 21:55:33,721] [INFO] [logging.py:77:log_dist] [Rank 0] Creating fp16 optimizer with dynamic loss scale
[2023-03-16 21:55:33,731] [INFO] [logging.py:77:log_dist] [Rank 0] DeepSpeed Final Optimizer = adam
[2023-03-16 21:55:33,731] [INFO] [logging.py:77:log_dist] [Rank 0] DeepSpeed using configured LR scheduler = WarmupDecayLR
[2023-03-16 21:55:33,731] [INFO] [logging.py:77:log_dist] [Rank 0] DeepSpeed LR Scheduler = <deepspeed.runtime.lr_schedules.WarmupDecayLR object at 0x7fe7d1d56620>
[2023-03-16 21:55:33,731] [INFO] [logging.py:77:log_dist] [Rank 0] step=0, skipped=0, lr=[0.001], mom=[(0.9, 0.999)]
[2023-03-16 21:55:33,732] [INFO] [config.py:1010:print] DeepSpeedEngine configuration:
[2023-03-16 21:55:33,732] [INFO] [config.py:1014:print] activation_checkpointing_config {
"partition_activations": false,
"contiguous_memory_optimization": false,
"cpu_checkpointing": false,
"number_checkpoints": null,
"synchronize_checkpoint_boundary": false,
"profile": false
}
[2023-03-16 21:55:33,732] [INFO] [config.py:1014:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2023-03-16 21:55:33,732] [INFO] [config.py:1014:print] amp_enabled .................. False
[2023-03-16 21:55:33,732] [INFO] [config.py:1014:print] amp_params ................... False
[2023-03-16 21:55:33,733] [INFO] [config.py:1014:print] autotuning_config ............ {
"enabled": false,
"start_step": null,
"end_step": null,
"metric_path": null,
"arg_mappings": null,
"metric": "throughput",
"model_info": null,
"results_dir": "autotuning_results",
"exps_dir": "autotuning_exps",
"overwrite": true,
"fast": true,
"start_profile_step": 3,
"end_profile_step": 5,
"tuner_type": "gridsearch",
"tuner_early_stopping": 5,
"tuner_num_trials": 50,
"model_info_path": null,
"mp_size": 1,
"max_train_batch_size": null,
"min_train_batch_size": 1,
"max_train_micro_batch_size_per_gpu": 1.024000e+03,
"min_train_micro_batch_size_per_gpu": 1,
"num_tuning_micro_batch_sizes": 3
}
[2023-03-16 21:55:33,733] [INFO] [config.py:1014:print] bfloat16_enabled ............. False
[2023-03-16 21:55:33,733] [INFO] [config.py:1014:print] checkpoint_parallel_write_pipeline False
[2023-03-16 21:55:33,733] [INFO] [config.py:1014:print] checkpoint_tag_validation_enabled True
[2023-03-16 21:55:33,733] [INFO] [config.py:1014:print] checkpoint_tag_validation_fail False
[2023-03-16 21:55:33,733] [INFO] [config.py:1014:print] comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7fe7d1d55fc0>
[2023-03-16 21:55:33,733] [INFO] [config.py:1014:print] communication_data_type ...... None
[2023-03-16 21:55:33,733] [INFO] [config.py:1014:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2023-03-16 21:55:33,733] [INFO] [config.py:1014:print] curriculum_enabled_legacy .... False
[2023-03-16 21:55:33,733] [INFO] [config.py:1014:print] curriculum_params_legacy ..... False
[2023-03-16 21:55:33,733] [INFO] [config.py:1014:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2023-03-16 21:55:33,733] [INFO] [config.py:1014:print] data_efficiency_enabled ...... False
[2023-03-16 21:55:33,733] [INFO] [config.py:1014:print] dataloader_drop_last ......... False
[2023-03-16 21:55:33,734] [INFO] [config.py:1014:print] disable_allgather ............ False
[2023-03-16 21:55:33,734] [INFO] [config.py:1014:print] dump_state ................... False
[2023-03-16 21:55:33,734] [INFO] [config.py:1014:print] dynamic_loss_scale_args ...... None
[2023-03-16 21:55:33,734] [INFO] [config.py:1014:print] eigenvalue_enabled ........... False
[2023-03-16 21:55:33,734] [INFO] [config.py:1014:print] eigenvalue_gas_boundary_resolution 1
[2023-03-16 21:55:33,734] [INFO] [config.py:1014:print] eigenvalue_layer_name ........ bert.encoder.layer
[2023-03-16 21:55:33,734] [INFO] [config.py:1014:print] eigenvalue_layer_num ......... 0
[2023-03-16 21:55:33,734] [INFO] [config.py:1014:print] eigenvalue_max_iter .......... 100
[2023-03-16 21:55:33,734] [INFO] [config.py:1014:print] eigenvalue_stability ......... 1e-06
[2023-03-16 21:55:33,734] [INFO] [config.py:1014:print] eigenvalue_tol ............... 0.01
[2023-03-16 21:55:33,734] [INFO] [config.py:1014:print] eigenvalue_verbose ........... False
[2023-03-16 21:55:33,734] [INFO] [config.py:1014:print] elasticity_enabled ........... False
[2023-03-16 21:55:33,734] [INFO] [config.py:1014:print] flops_profiler_config ........ {
"enabled": false,
"profile_step": 1,
"module_depth": -1,
"top_modules": 1,
"detailed": true,
"output_file": null
}
[2023-03-16 21:55:33,734] [INFO] [config.py:1014:print] fp16_auto_cast ............... False
[2023-03-16 21:55:33,734] [INFO] [config.py:1014:print] fp16_enabled ................. True
[2023-03-16 21:55:33,734] [INFO] [config.py:1014:print] fp16_master_weights_and_gradients False
[2023-03-16 21:55:33,734] [INFO] [config.py:1014:print] global_rank .................. 0
[2023-03-16 21:55:33,734] [INFO] [config.py:1014:print] grad_accum_dtype ............. None
[2023-03-16 21:55:33,734] [INFO] [config.py:1014:print] gradient_accumulation_steps .. 1
[2023-03-16 21:55:33,734] [INFO] [config.py:1014:print] gradient_clipping ............ 100.0
[2023-03-16 21:55:33,735] [INFO] [config.py:1014:print] gradient_predivide_factor .... 1.0
[2023-03-16 21:55:33,735] [INFO] [config.py:1014:print] initial_dynamic_scale ........ 65536
[2023-03-16 21:55:33,735] [INFO] [config.py:1014:print] load_universal_checkpoint .... False
[2023-03-16 21:55:33,735] [INFO] [config.py:1014:print] loss_scale ................... 0
[2023-03-16 21:55:33,735] [INFO] [config.py:1014:print] memory_breakdown ............. False
[2023-03-16 21:55:33,735] [INFO] [config.py:1014:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
[2023-03-16 21:55:33,735] [INFO] [config.py:1014:print] nebula_config ................ {
"enabled": false,
"persistent_storage_path": null,
"persistent_time_interval": 100,
"num_of_version_in_retention": 2,
"enable_nebula_load": true,
"load_path": null
}
[2023-03-16 21:55:33,735] [INFO] [config.py:1014:print] optimizer_legacy_fusion ...... False
[2023-03-16 21:55:33,735] [INFO] [config.py:1014:print] optimizer_name ............... adam
[2023-03-16 21:55:33,735] [INFO] [config.py:1014:print] optimizer_params ............. None
[2023-03-16 21:55:33,735] [INFO] [config.py:1014:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2023-03-16 21:55:33,735] [INFO] [config.py:1014:print] pld_enabled .................. False
[2023-03-16 21:55:33,735] [INFO] [config.py:1014:print] pld_params ................... False
[2023-03-16 21:55:33,735] [INFO] [config.py:1014:print] prescale_gradients ........... False
[2023-03-16 21:55:33,735] [INFO] [config.py:1014:print] scheduler_name ............... WarmupDecayLR
[2023-03-16 21:55:33,735] [INFO] [config.py:1014:print] scheduler_params ............. {'warmup_min_lr': 1e-06, 'warmup_max_lr': 0.0002, 'warmup_num_steps': 1000, 'total_num_steps': 1000, 'warmup_type': 'linear'}
[2023-03-16 21:55:33,735] [INFO] [config.py:1014:print] sparse_attention ............. None
[2023-03-16 21:55:33,735] [INFO] [config.py:1014:print] sparse_gradients_enabled ..... False
[2023-03-16 21:55:33,735] [INFO] [config.py:1014:print] steps_per_print .............. 10
[2023-03-16 21:55:33,736] [INFO] [config.py:1014:print] train_batch_size ............. 1
[2023-03-16 21:55:33,736] [INFO] [config.py:1014:print] train_micro_batch_size_per_gpu 1
[2023-03-16 21:55:33,736] [INFO] [config.py:1014:print] use_node_local_storage ....... False
[2023-03-16 21:55:33,736] [INFO] [config.py:1014:print] wall_clock_breakdown ......... False
[2023-03-16 21:55:33,736] [INFO] [config.py:1014:print] world_size ................... 1
[2023-03-16 21:55:33,736] [INFO] [config.py:1014:print] zero_allow_untested_optimizer False
[2023-03-16 21:55:33,736] [INFO] [config.py:1014:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False
[2023-03-16 21:55:33,736] [INFO] [config.py:1014:print] zero_enabled ................. False
[2023-03-16 21:55:33,736] [INFO] [config.py:1014:print] zero_optimization_stage ...... 0
[2023-03-16 21:55:33,736] [INFO] [config.py:999:print_user_config] json = {
"train_micro_batch_size_per_gpu": 1,
"gradient_accumulation_steps": 1,
"optimizer": {
"type": "Adam",
"lr": 1e-06
},
"scheduler": {
"type": "WarmupDecayLR",
"params": {
"warmup_min_lr": 1e-06,
"warmup_max_lr": 0.0002,
"warmup_num_steps": 1000,
"total_num_steps": 1000,
"warmup_type": "linear"
}
},
"gradient_clipping": 100.0,
"fp16": {
"enabled": true
}
}
Using /root/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Emitting ninja build file /root/.cache/torch_extensions/py310_cu117/utils/build.ninja...
Building extension module utils...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module utils...
Time to load utils op: 0.11644482612609863 seconds
[2023-03-16 21:55:34,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loading checkpoint from ckpts/test/nar/model/default/mp_rank_00_model_states.pt...
[2023-03-16 21:55:35,888] [INFO] [torch_checkpoint_engine.py:25:load] [Torch] Loaded checkpoint from ckpts/test/nar/model/default/mp_rank_00_model_states.pt.
[2023-03-16 21:55:35,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loading checkpoint from ckpts/test/nar/model/default/mp_rank_00_model_states.pt...
[2023-03-16 21:55:36,345] [INFO] [torch_checkpoint_engine.py:25:load] [Torch] Loaded checkpoint from ckpts/test/nar/model/default/mp_rank_00_model_states.pt.
2023-03-16 21:55:36 - vall_e.utils.trainer - INFO - GR=0;LR=0 -
{
"batch_size": 1,
"cache_dataloader": false,
"cache_dir": ".cache/test/nar",
"cfg_name": "test/nar",
"cfg_relpath": null,
"ckpt_dir": "ckpts/test/nar",
"ckpt_root": "ckpts",
"data_dirs": "[PosixPath('data/test')]",
"data_root": "data",
"device": "cuda",
"dis_warmup_max_lr": 0.0004,
"ds_cfg": {
"train_micro_batch_size_per_gpu": 1,
"gradient_accumulation_steps": 1,
"optimizer": {
"type": "Adam",
"lr": 1e-06
},
"scheduler": {
"type": "WarmupDecayLR",
"params": {
"warmup_min_lr": 1e-06,
"warmup_max_lr": 0.0002,
"warmup_num_steps": 1000,
"total_num_steps": 1000,
"warmup_type": "linear"
}
},
"gradient_clipping": 100.0,
"fp16": {
"enabled": true
}
},
"eval_batch_size": 1,
"eval_every": 500,
"fp16_cfg": {
"enabled": true
},
"git_commit": "3476d393d2133fa9b50d5ad999ca13b95fc22060",
"git_status": "On branch main\nYour branch is up to date with 'origin/main'.\n\nChanges not staged for commit:\n (use \"git add/rm <file>...\" to update what will be committed)\n (use \"git restore <file>...\" to discard changes in working directory)\n\tmodified: data/test/test.normalized.txt\n\tmodified: data/test/test.phn.txt\n\tmodified: data/test/test.qnt.pt\n\tmodified: data/test/test.wav\n\tdeleted: data/test/test2.phn.txt\n\tdeleted: data/test/test2.qnt.pt\n\tmodified: scripts/plot.py\n\tmodified: scripts/run.sh\n\tmodified: vall_e/config.py\n\tmodified: vall_e/data.py\n\tmodified: vall_e/train.py\n\nUntracked files:\n (use \"git add <file>...\" to include in what will be committed)\n\ttest/\n\ttoy.wav\n\tzoo/\n\nno changes added to commit (use \"git add\" and/or \"git commit -a\")",
"gradient_accumulation_steps": 1,
"gradient_clipping": 100.0,
"log_dir": "logs/test/nar/1679003729",
"log_root": "logs",
"max_grad_norm": null,
"max_iter": 1000,
"max_num_val": 20,
"max_phones": 50000,
"max_prompts": 3,
"max_val_ar_steps": 300,
"min_phones": 5,
"model": "nar-quarter",
"nj": 8,
"num_tokens": 1024,
"p_additional_prompt": 0.8,
"relpath": "test/nar",
"sample_rate": 24000,
"sampling_temperature": 1.0,
"save_artifacts_every": 100,
"save_ckpt_every": 500,
"save_on_oom": true,
"save_on_quit": true,
"spkr_name_getter": "lambda p: p.parts[-2]",
"start_time": 1679003729,
"token_dim": 256,
"use_fp16": true,
"warmup_max_lr": 0.0002,
"warmup_min_lr": 1e-06,
"warmup_num_steps": 1000
}
2023-03-16 21:55:36 - vall_e.utils.trainer - INFO - GR=0;LR=0 -
New epoch starts.
/usr/local/lib/python3.10/site-packages/torch/utils/data/dataloader.py:561: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(
Training cannot start. Can you me any help. Thanks
Based on the output I think the training has already started. I have encountered the same situation in #58 and it was because the saved model at ckpts/test/nar/model/default/mp_rank_00_model_states.pt
is already at the maximum step. You can delete the checkpoints and give it another try.
@cantabile-kwok I started the training by adding multiple sample . But training get stuck
New epoch starts.
2023-03-18 02:04:18 - vall_e.utils.trainer - INFO - GR=0;LR=0 -
New epoch starts.
2023-03-18 02:04:18 - vall_e.utils.trainer - INFO - GR=0;LR=0 -
New epoch starts.
2023-03-18 02:04:18 - vall_e.utils.trainer - INFO - GR=0;LR=0 -
New epoch starts.
2023-03-18 02:04:18 - vall_e.utils.trainer - INFO - GR=0;LR=0 -
New epoch starts.
2023-03-18 02:04:18 - vall_e.utils.trainer - INFO - GR=0;LR=0 -
New epoch starts.
2023-03-18 02:04:18 - vall_e.utils.trainer - INFO - GR=0;LR=0 -
New epoch starts.
2023-03-18 02:04:18 - vall_e.utils.trainer - INFO - GR=0;LR=0 -
New epoch starts.
2023-03-18 02:04:18 - vall_e.utils.trainer - INFO - GR=0;LR=0 -
New epoch starts.
2023-03-18 02:04:18 - vall_e.utils.trainer - INFO - GR=0;LR=0 -
New epoch starts.
2023-03-18 02:04:18 - vall_e.utils.trainer - INFO - GR=0;LR=0 -
New epoch starts.
2023-03-18 02:04:18 - vall_e.utils.trainer - INFO - GR=0;LR=0 -
New epoch starts.
2023-03-18 02:04:18 - vall_e.utils.trainer - INFO - GR=0;LR=0 -
New epoch starts.
2023-03-18 02:04:18 - vall_e.utils.trainer - INFO - GR=0;LR=0 -
New epoch starts.
2023-03-18 02:04:18 - vall_e.utils.trainer - INFO - GR=0;LR=0 -
New epoch starts.
2023-03-18 02:04:18 - vall_e.utils.trainer - INFO - GR=0;LR=0 -
Hi @cantabile-kwok , I am facing the follow
!python -m vall_e 'He is a good person.' data/test/sampleTwo.wav sec.wav
Traceback (most recent call last):
File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/content/drive/MyDrive/Colab Notebooks/VALL_E/vall-e/vall_e/__main__.py", line 43, in <module>
main()
File "/content/drive/MyDrive/Colab Notebooks/VALL_E/vall-e/vall_e/__main__.py", line 29, in main
phns = torch.tensor([symmap[p] for p in g2p.encode(args.text)])
File "/content/drive/MyDrive/Colab Notebooks/VALL_E/vall-e/vall_e/__main__.py", line 29, in <listcomp>
phns = torch.tensor([symmap[p] for p in g2p.encode(args.text)])
KeyError: 'HH'
Probably your training data does not contain phoneme "HH". The model cannot speak a phoneme that it has not seen, also it is basically not even registered in the symbol map.
By the way, if you are just training on a few samples, it is not likely for the model to generate customized sentence, as it overfits to the few training samples.
@cantabile-kwok Thank you for reply. I am confused with the following command .Can you elobrate it for me
python -m vall_e <text> <ref_path> <out_path> --ar-ckpt zoo/ar.pt --nar-ckpt zoo/nar.pt
why ref audio path ?
!python -m vall_e 'hello world' data/test/test.wav toy.wav
It is creating confusion for me .
when ever I run this command
!python -m vall_e 'hello science' data/test/test.wav toy.wav
it save the audio of test.wav into toy.wav.Why it does not generate and save the audio of 'hello science' as audio into toy.wav? Can you explain it for me ,Thanks
The reference audio is to provide speaker reference that the model intends to copy from. This is the target speaker in zero-shot TTS scenario.
I guess you only used a few samples to train the model. If this is the case, then the model is not actually well-trained as I have mentioned. To achieve a relatively feasible performance, you have to use a considerable amount of training data.
@cantabile-kwok what should be length of each audio in training sample?
It depends but usually around something like 8s and usually no longer than 20s. Also OK for short audio.
what is mean by 3 s audio and generate the cloned voice?
The paper seems to cut the audio prompt (target speaker's reference speech) to 3s and feed the model. But this implementation seems not to consider this.
@cantabile-kwok Should we need to keep the audios of the same person in the training data which voice is going to be clone in the testing?
In the training process we of course have to use the same person's speech as acoustic prompt, to let the model learn speaker timbre from this prompt. In testing stage typically we shouldn't use a speaker that was seen in training, because this is some kind of cheating. So some speakers have to be left out for testing.
@cantabile-kwok I have used 10 sample of my voice for training. But when I tried to generate the text (different text not the same in the training sample) it is not cloning the text in testing and generating the noise. what I should do.
On which data have you been training
@cantabile-kwok I created my own audios (recorded with micro phone ) on some English sentences which use in daily life
I have to say the training data size is way too small for such a large model to train. Usually these neural networks need dozens of hours of speech to work well, so making up own dataset is not likely to work
@cantabile-kwok So How I clone my own voice? If I train the model on LibriSST dataset, will it work for my voice for cloning?
In this repo no pretrained model is released nor anyone can train a successful model, so if you just want to clone your own voice instead of training it, the best option may be to seek another method...
Another method means any algorithm or GitHub repository.
I didn't see the difference. You may look for another algorithm and find the Github repo of corresponding implementation, or find an off-the-shelf tool online. I'm not sure.
@cantabile-kwok This blog is saying you can clone your own voice .|
Link : https://blog.paperspace.com/training-vall-e-from-scratch-on-your-own-voice-samples/
It still needs to train the model. LibriLight is a very large dataset and you cannot effectively train it unless you have a mass amount of computation resources. If you use a smaller (yet not so small) dataset you still need to train the model. Anyway, you will need to train the model from scratch as there is no pretrained models publicly.
@cantabile-kwok So it mean , cannot generate the audio with own voice ? But As per paper
VALL-E emerges in-context learning capabilities and can be used to synthesize high-quality personalized speech with only a 3-second enrolled recording of an unseen speaker as an acoustic prompt.
The paper is true but they must provide their trained model, otherwise you have nothing to clone your own voice and still need to spend time and effort training your own model.
Hi Guys , I am facing following error. I am training model on custom prepared data