RVC-Boss / GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
MIT License
34.02k stars 3.9k forks source link

Training errors out #21

Closed FlashlightET closed 8 months ago

FlashlightET commented 9 months ago

Trying to train few shot but i get errors because it's not creating these files:

self.path2: logs/xxx/2-name2text.txt
self.path4: logs/xxx/4-cnhubert
self.path5: logs/xxx/5-wav32k
Traceback (most recent call last):
  File "x:\sovits\GPT_SoVITS\s2_train.py", line 402, in <module>
    main()
  File "x:\sovits\GPT_SoVITS\s2_train.py", line 53, in main
    mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,))
  File "x:\sovits\runtime\lib\site-packages\torch\multiprocessing\spawn.py", line 239, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "x:\sovits\runtime\lib\site-packages\torch\multiprocessing\spawn.py", line 197, in start_processes
    while not context.join():
  File "x:\sovits\runtime\lib\site-packages\torch\multiprocessing\spawn.py", line 160, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "x:\sovits\runtime\lib\site-packages\torch\multiprocessing\spawn.py", line 69, in _wrap
    fn(i, *args)
  File "x:\sovits\GPT_SoVITS\s2_train.py", line 69, in run
    train_dataset = TextAudioSpeakerLoader(hps.data)########
  File "x:\sovits\GPT_SoVITS\module\data_utils.py", line 37, in __init__
    assert os.path.exists(self.path2)
AssertionError

in addition, where exactly do i place the xxx.list file?

RVC-Boss commented 9 months ago

Does logs/xxx/2-name2text.txt exists?

luguoyixiazi commented 9 months ago

哥,你这个txt_path="%s/2-name2text-%s.txt"%(opt_dir,i_part)文件的定义是有问题的……麻烦把那些代码改一下,你前面切分gpu_names=gpu_numbers1c.split("-")会给文件带-0,-1,-2,因为webui单卡整的0-0……但是训练的时候还是按照2-name2text.txt之类的来拿……

Does logs/xxx/2-name2text.txt exists?

RVC-Boss commented 9 months ago

哥,你这个txt_path="%s/2-name2text-%s.txt"%(opt_dir,i_part)文件的定义是有问题的……麻烦把那些代码改一下,你前面切分gpu_names=gpu_numbers1c.split("-")会给文件带-0,-1,-2,因为webui单卡整的0-0……但是训练的时候还是按照2-name2text.txt之类的来拿……

Does logs/xxx/2-name2text.txt exists?

这个问题我已经发现了下个版本就会改了(现在在一键三连里面会自动合并),但是我不知道楼主遇到是同一个问题么

win10ogod commented 9 months ago

哥們,你這個txt_path="%s/2-name2text-%s.txt"%(opt_dir,i_part)檔案的定義是有問題的…麻煩把那些程式碼改一下,你前面切分gpu_names=gpu_numbers1c.split ("-")會給文件帶-0,-1,-2,因為webui單卡整的0-0…但是訓練的時候還是按照2-name2text.txt之類的來拿…

logs/xxx/2-name2text.txt 是否存在?

這個問題我已經發現了下一個版本就會改了(現在按三連裡面會自動合併),但是我不知道樓主遇到的是同樣的問題麼

@RVC-Boss 發生這樣的錯誤如何修正:

"runtime\python" GPT_SoVITS/s1_train.py --config_file "TEMP/tmp_s1.yaml"
Seed set to 1234
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
<All keys matched successfully>
ckpt_path: None
[rank: 0] Seed set to 1234
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
Traceback (most recent call last):
  File "D:\GPT-SoVITS\GPT_SoVITS\s1_train.py", line 138, in <module>
    main(args)
  File "D:\GPT-SoVITS\GPT_SoVITS\s1_train.py", line 115, in main
    trainer.fit(model, data_module, ckpt_path=ckpt_path)
  File "D:\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 544, in fit
    call._call_and_handle_interrupt(
  File "D:\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\trainer\call.py", line 43, in _call_and_handle_interrupt
    return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
  File "D:\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\strategies\launchers\subprocess_script.py", line 102, in launch
    return function(*args, **kwargs)
  File "D:\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 580, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "D:\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 947, in _run
    self.strategy.setup_environment()
  File "D:\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\strategies\ddp.py", line 148, in setup_environment
    self.setup_distributed()
  File "D:\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\strategies\ddp.py", line 199, in setup_distributed
    _init_dist_connection(self.cluster_environment, self._process_group_backend, timeout=self._timeout)
  File "D:\GPT-SoVITS\runtime\lib\site-packages\lightning_fabric\utilities\distributed.py", line 290, in _init_dist_connection
    torch.distributed.init_process_group(torch_distributed_backend, rank=global_rank, world_size=world_size, **kwargs)
  File "D:\GPT-SoVITS\runtime\lib\site-packages\torch\distributed\distributed_c10d.py", line 888, in init_process_group
    store, rank, world_size = next(rendezvous_iterator)
  File "D:\GPT-SoVITS\runtime\lib\site-packages\torch\distributed\rendezvous.py", line 245, in _env_rendezvous_handler
    store = _create_c10d_store(master_addr, master_port, rank, world_size, timeout)
  File "D:\GPT-SoVITS\runtime\lib\site-packages\torch\distributed\rendezvous.py", line 176, in _create_c10d_store
    return TCPStore(
RuntimeError: unmatched '}' in format string
win10ogod commented 9 months ago

@RVC-Boss 大佬,我用您的整合包訓練gpt出現錯誤

FlashlightET commented 9 months ago

Coming back to this, I realized I forgot to ran step 1A (the training set formatting tool) before trying to run 1B 🤦 . I didn't even look into that tab at the time.

FlashlightET commented 9 months ago

after running the dataset format tool i still get the error:

INFO:xxx:{'train': {'log_interval': 100, 'eval_interval': 500, 'seed': 1234, 'epochs': 8, 'learning_rate': 0.0001, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 6, 'fp16_run': True, 'lr_decay': 0.999875, 'segment_size': 20480, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0, 'text_low_lr_rate': 0.4, 'pretrained_s2G': 'GPT_SoVITS/pretrained_models/s2G488k.pth', 'pretrained_s2D': 'GPT_SoVITS/pretrained_models/s2D488k.pth', 'if_save_latest': True, 'if_save_every_weights': True, 'save_every_epoch': 4, 'gpu_numbers': '0'}, 'data': {'max_wav_value': 32768.0, 'sampling_rate': 32000, 'filter_length': 2048, 'hop_length': 640, 'win_length': 2048, 'n_mel_channels': 128, 'mel_fmin': 0.0, 'mel_fmax': None, 'add_blank': True, 'n_speakers': 300, 'cleaned_text': True, 'exp_dir': 'logs/xxx'}, 'model': {'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0.1, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [10, 8, 2, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 8, 2, 2], 'n_layers_q': 3, 'use_spectral_norm': False, 'gin_channels': 512, 'semantic_frame_rate': '25hz', 'freeze_quantizer': True}, 's2_ckpt_dir': 'logs/xxx', 'content_module': 'cnhubert', 'save_weight_dir': 'SoVITS_weights', 'name': 'xxx', 'pretrain': None, 'resume_step': None}
INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0
INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
Traceback (most recent call last):
  File "X:\sovits\GPT_SoVITS\s2_train.py", line 566, in <module>
    main()
  File "X:\sovits\GPT_SoVITS\s2_train.py", line 53, in main
    mp.spawn(
  File "X:\sovits\runtime\lib\site-packages\torch\multiprocessing\spawn.py", line 239, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "X:\sovits\runtime\lib\site-packages\torch\multiprocessing\spawn.py", line 197, in start_processes
    while not context.join():
  File "X:\sovits\runtime\lib\site-packages\torch\multiprocessing\spawn.py", line 160, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "X:\sovits\runtime\lib\site-packages\torch\multiprocessing\spawn.py", line 69, in _wrap
    fn(i, *args)
  File "X:\sovits\GPT_SoVITS\s2_train.py", line 81, in run
    train_dataset = TextAudioSpeakerLoader(hps.data)  ########
  File "X:\sovits\GPT_SoVITS\module\data_utils.py", line 36, in __init__
    assert os.path.exists(self.path2)
AssertionError

edit: im noticing the text files are named 0 and 1 as was pointed out earlier. i am aware that this will be fixed in the next update and will patiently wait for this. no rush.

Manually merging the two tsv and txt files in a text editor made training work fine. (i got through both sovits and gpt training)

luguoyixiazi commented 8 months ago

哥,你这个txt_path="%s/2-name2text-%s.txt"%(opt_dir,i_part)文件的定义是有问题的……麻烦把那些代码改一下,你前面切分gpu_names=gpu_numbers1c.split("-")会给文件带-0,-1,-2,因为webui单卡整的0-0……但是训练的时候还是按照2-name2text.txt之类的来拿……

Does logs/xxx/2-name2text.txt exists?

这个问题我已经发现了下个版本就会改了(现在在一键三连里面会自动合并),但是我不知道楼主遇到是同一个问题么

还有一个,1Bb-GPT训练时训练集self.semantic_data = pd.read_csv(semantic_path, delimiter='\t', encoding="utf-8")有点问题,我这边用了第一列作为列名,我用的最新的pandas……但是训练代码里面使用‘item_name’来取文件名……怎么都和列名不匹配,应该是前面定义的时候没定义进去吧?所以才会出现后面那个哥们用整合包訓練gpt出问题?

luguoyixiazi commented 8 months ago

@RVC-Boss 大佬,我用您的整合包訓練gpt出現錯誤

哥们如果你的报错也是KeyError: 'item_name',试试把GPT_SoVITS\AR\data\dataset.py里面init_batch改一下:

class Text2SemanticDataset(Dataset):
    """dataset class for text tokens to semantic model training."""

    def __init__(self,
                 phoneme_path: str,
                 semantic_path: str,
                 max_sample: int = None,
                 max_sec: int = 100,
                 pad_val: int = 1024,
                 # min value of phoneme/sec
                 min_ps_ratio: int = 3,
                 # max value of phoneme/sec
                 max_ps_ratio: int = 25) -> None:
        super().__init__()
        # self.semantic_data = pd.read_csv(semantic_path, delimiter='\t', encoding="utf-8")
        self.semantic_data = pd.read_csv(semantic_path, delimiter='\t', encoding="utf-8", names=['item_name','semantic_audio'])
        # get dict

然后下面init_batch改一下试试:

    def init_batch(self):
        # semantic_data_len = len(self.semantic_data)
        phoneme_data_len = len(self.phoneme_data.keys())
        print("semantic_data_len:", semantic_data_len)
        print("phoneme_data_len:", phoneme_data_len)
        idx = 0
        num_not_in = 0
        num_deleted_bigger = 0
        num_deleted_ps = 0
        # for i in range(semantic_data_len):
        for index,row in self.semantic_data.iterrows():
            # 先依次遍历
            # get str
            # item_name = self.semantic_data['item_name'][i]
            item_name = row['item_name']
            # print(self.phoneme_data)
            try:
                phoneme, word2ph, text = self.phoneme_data[item_name]
            except Exception:
                traceback.print_exc()
                # print(f"{item_name} not in self.phoneme_data !")
                num_not_in += 1
                continue

不是整个函数都整哈,替代同样的到最后一行就行,第一个到# get dict,第二个到continue

HowcanoeWang commented 8 months ago

哥,你这个txt_path="%s/2-name2text-%s.txt"%(opt_dir,i_part)文件的定义是有问题的……麻烦把那些代码改一下,你前面切分gpu_names=gpu_numbers1c.split("-")会给文件带-0,-1,-2,因为webui单卡整的0-0……但是训练的时候还是按照2-name2text.txt之类的来拿……

Does logs/xxx/2-name2text.txt exists?

这个问题我已经发现了下个版本就会改了(现在在一键三连里面会自动合并),但是我不知道楼主遇到是同一个问题么

@RVC-Boss 最新的版本有这样的报错:

Running on local URL:  http://0.0.0.0:9871
"/home/hwang/Applications/miniconda3/envs/sovits/bin/python" GPT_SoVITS/prepare_datasets/1-get-text.py
"/home/hwang/Applications/miniconda3/envs/sovits/bin/python" GPT_SoVITS/prepare_datasets/1-get-text.py
Traceback (most recent call last):
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/transformers/utils/hub.py", line 389, in cached_file
    resolved_file = hf_hub_download(
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 110, in _inner_fn
    validate_repo_id(arg_value)
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 158, in validate_repo_id
    raise HFValidationError(
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'GPT_SoVITS/pretrained_models/chinese-roberta-wwm-ext-large'. Use `repo_type` argument if needed.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/hwang/Applications/GPTSoVITS/GPT_SoVITS/prepare_datasets/1-get-text.py", line 50, in <module>
    tokenizer = AutoTokenizer.from_pretrained(bert_pretrained_dir)
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 737, in from_pretrained
    tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 569, in get_tokenizer_config
    resolved_config_file = cached_file(
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/transformers/utils/hub.py", line 454, in cached_file
    raise EnvironmentError(
OSError: Incorrect path_or_model_id: 'GPT_SoVITS/pretrained_models/chinese-roberta-wwm-ext-large'. Please provide either the path to a local folder or the repo_id of a model on the Hub.
Traceback (most recent call last):
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/transformers/utils/hub.py", line 389, in cached_file
    resolved_file = hf_hub_download(
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 110, in _inner_fn
    validate_repo_id(arg_value)
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 158, in validate_repo_id
    raise HFValidationError(
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'GPT_SoVITS/pretrained_models/chinese-roberta-wwm-ext-large'. Use `repo_type` argument if needed.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/hwang/Applications/GPTSoVITS/GPT_SoVITS/prepare_datasets/1-get-text.py", line 50, in <module>
    tokenizer = AutoTokenizer.from_pretrained(bert_pretrained_dir)
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 737, in from_pretrained
    tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 569, in get_tokenizer_config
    resolved_config_file = cached_file(
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/transformers/utils/hub.py", line 454, in cached_file
    raise EnvironmentError(
OSError: Incorrect path_or_model_id: 'GPT_SoVITS/pretrained_models/chinese-roberta-wwm-ext-large'. Please provide either the path to a local folder or the repo_id of a model on the Hub.
Traceback (most recent call last):
  File "/home/hwang/Applications/GPTSoVITS/webui.py", line 499, in open1abc
    with open(txt_path, "r",encoding="utf8") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'logs/xxx/2-name2text-0.txt'

不确定是不是这个问题引起的?

补充一下页面的参数:

图片

只有一张单卡GPU,感觉是不是没必要切分文本文件?

company8 commented 8 months ago
Running on public URL: https://9afae4f5-3868-4a65.gradio.live
"/usr/bin/python" GPT_SoVITS/prepare_datasets/1-get-text.py
"/usr/bin/python" GPT_SoVITS/prepare_datasets/1-get-text.py
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 389, in cached_file
    resolved_file = hf_hub_download(
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 110, in _inner_fn
    validate_repo_id(arg_value)
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 158, in validate_repo_id
    raise HFValidationError(
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'GPT_SoVITS/pretrained_models/chinese-roberta-wwm-ext-large'. Use `repo_type` argument if needed.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/GPT_SoVITS/GPT_SoVITS/prepare_datasets/1-get-text.py", line 50, in <module>
    tokenizer = AutoTokenizer.from_pretrained(bert_pretrained_dir)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py", line 737, in from_pretrained
    tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py", line 569, in get_tokenizer_config
    resolved_config_file = cached_file(
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 454, in cached_file
    raise EnvironmentError(
OSError: Incorrect path_or_model_id: 'GPT_SoVITS/pretrained_models/chinese-roberta-wwm-ext-large'. Please provide either the path to a local folder or the repo_id of a model on the Hub.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 389, in cached_file
    resolved_file = hf_hub_download(
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 110, in _inner_fn
    validate_repo_id(arg_value)
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 158, in validate_repo_id
    raise HFValidationError(
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'GPT_SoVITS/pretrained_models/chinese-roberta-wwm-ext-large'. Use `repo_type` argument if needed.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/GPT_SoVITS/GPT_SoVITS/prepare_datasets/1-get-text.py", line 50, in <module>
    tokenizer = AutoTokenizer.from_pretrained(bert_pretrained_dir)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py", line 737, in from_pretrained
    tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py", line 569, in get_tokenizer_config
    resolved_config_file = cached_file(
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 454, in cached_file
    raise EnvironmentError(
OSError: Incorrect path_or_model_id: 'GPT_SoVITS/pretrained_models/chinese-roberta-wwm-ext-large'. Please provide either the path to a local folder or the repo_id of a model on the Hub.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/gradio/routes.py", line 321, in run_predict
    output = await app.blocks.process_api(
  File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1006, in process_api
    result = await self.call_function(fn_index, inputs, iterator, request)
  File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 859, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 2106, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 833, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 408, in async_iteration
    return next(iterator)
  File "/GPT_SoVITS/webui.py", line 337, in open1a
    with open(txt_path, "r", encoding="utf8") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'logs/xxx/2-name2text-0.txt'
"/usr/bin/python" GPT_SoVITS/prepare_datasets/2-get-hubert-wav32k.py
"/usr/bin/python" GPT_SoVITS/prepare_datasets/2-get-hubert-wav32k.py
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 389, in cached_file
    resolved_file = hf_hub_download(
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 110, in _inner_fn
    validate_repo_id(arg_value)
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 158, in validate_repo_id
    raise HFValidationError(
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'GPT_SoVITS/pretrained_models/chinese-hubert-base'. Use `repo_type` argument if needed.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/GPT_SoVITS/GPT_SoVITS/prepare_datasets/2-get-hubert-wav32k.py", line 51, in <module>
    model=cnhubert.get_model()
  File "/GPT_SoVITS/GPT_SoVITS/feature_extractor/cnhubert.py", line 70, in get_model
    model = CNHubert()
  File "/GPT_SoVITS/GPT_SoVITS/feature_extractor/cnhubert.py", line 25, in __init__
    self.model = HubertModel.from_pretrained(cnhubert_base_path)
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 2789, in from_pretrained
    resolved_config_file = cached_file(
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 454, in cached_file
    raise EnvironmentError(
OSError: Incorrect path_or_model_id: 'GPT_SoVITS/pretrained_models/chinese-hubert-base'. Please provide either the path to a local folder or the repo_id of a model on the Hub.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 389, in cached_file
    resolved_file = hf_hub_download(
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 110, in _inner_fn
    validate_repo_id(arg_value)
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 158, in validate_repo_id
    raise HFValidationError(
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'GPT_SoVITS/pretrained_models/chinese-hubert-base'. Use `repo_type` argument if needed.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/GPT_SoVITS/GPT_SoVITS/prepare_datasets/2-get-hubert-wav32k.py", line 51, in <module>
    model=cnhubert.get_model()
  File "/GPT_SoVITS/GPT_SoVITS/feature_extractor/cnhubert.py", line 70, in get_model
    model = CNHubert()
  File "/GPT_SoVITS/GPT_SoVITS/feature_extractor/cnhubert.py", line 25, in __init__
    self.model = HubertModel.from_pretrained(cnhubert_base_path)
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 2789, in from_pretrained
    resolved_config_file = cached_file(
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 454, in cached_file
    raise EnvironmentError(
OSError: Incorrect path_or_model_id: 'GPT_SoVITS/pretrained_models/chinese-hubert-base'. Please provide either the path to a local folder or the repo_id of a model on the Hub.
"/usr/bin/python" GPT_SoVITS/s2_train.py --config "TEMP/tmp_s2.json"
Traceback (most recent call last):
  File "/GPT_SoVITS/GPT_SoVITS/s2_train.py", line 566, in <module>
    main()
  File "/GPT_SoVITS/GPT_SoVITS/s2_train.py", line 53, in main
    mp.spawn(
  File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 239, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 197, in start_processes
    while not context.join():
  File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 160, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
    fn(i, *args)
  File "/GPT_SoVITS/GPT_SoVITS/s2_train.py", line 81, in run
    train_dataset = TextAudioSpeakerLoader(hps.data)  ########
  File "/GPT_SoVITS/GPT_SoVITS/module/data_utils.py", line 36, in __init__
    assert os.path.exists(self.path2)
AssertionError

"/usr/bin/python" GPT_SoVITS/s1_train.py --config_file "TEMP/tmp_s1.yaml" 
Seed set to 1234
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Traceback (most recent call last):
  File "/GPT_SoVITS/GPT_SoVITS/s1_train.py", line 171, in <module>
    main(args)
  File "/GPT_SoVITS/GPT_SoVITS/s1_train.py", line 128, in main
    model: Text2SemanticLightningModule = Text2SemanticLightningModule(
  File "/GPT_SoVITS/GPT_SoVITS/AR/models/t2s_lightning_module.py", line 26, in __init__
    torch.load(pretrained_s1, map_location="cpu")["weight"]
  File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 791, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 271, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 252, in __init__
    super().__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'GPT_SoVITS/pretrained_models/s1bert25hz-2kh-longer-epoch=68e-step=50232.ckpt'

Same as @HowcanoeWang I tried every tab in the training to see if just the Training formating tab but nope.

EDIT: I forgot to mention that in the webui the path for the models is GPT_SoVITS while the repo is GPT-SoVITS and I thought that renaming the directory would fix the issue but it didn't.

company8 commented 8 months ago

Hey @HowcanoeWang how did you make the .list file ? The Text annotation file how did create it, did I miss some step because I don't how to create it and no user guide since it's new :D

HowcanoeWang commented 8 months ago

Hey @HowcanoeWang how did you make the .list file ? The Text annotation file how did create it, did I miss some step because I don't how to create it and no user guide since it's new :D

Use the '中文离线识别ASR' tool to create the list file.

But if you are not using that tool, simply create an empty txt file and rename to xxxx.list, and type the following format should also works: (I guess...)

/absolute/file/path/to/each/vocal1.wav||ZH|这边输入对应的文字1。
/absolute/file/path/to/each/vocal2.wav||ZH|这边输入对应的文字2。

Very labor-intensive, strongly recommend the ASR tool

图片

company8 commented 8 months ago

Hey @HowcanoeWang how did you make the .list file ? The Text annotation file how did create it, did I miss some step because I don't how to create it and no user guide since it's new :D

Use the '中文离线识别ASR' tool to create the list file.

But if you are not using that tool, simply create an empty txt file and rename to xxxx.list, and type the following format should also works: (I guess...)

/absolute/file/path/to/each/vocal1.wav||ZH|这边输入对应的文字1。
/absolute/file/path/to/each/vocal2.wav||ZH|这边输入对应的文字2。

Very labor-intensive, strongly recommend the ASR tool

图片

I didn't downloaded the ASR models... So I only need the ASR model ? I don't need the VAD and Punc model ? Thank you so much :)

HowcanoeWang commented 8 months ago

Hey @HowcanoeWang how did you make the .list file ? The Text annotation file how did create it, did I miss some step because I don't how to create it and no user guide since it's new :D

Use the '中文离线识别ASR' tool to create the list file. But if you are not using that tool, simply create an empty txt file and rename to xxxx.list, and type the following format should also works: (I guess...)

/absolute/file/path/to/each/vocal1.wav||ZH|这边输入对应的文字1。
/absolute/file/path/to/each/vocal2.wav||ZH|这边输入对应的文字2。

Very labor-intensive, strongly recommend the ASR tool 图片

I didn't downloaded the ASR models... So I only need the ASR model ? I don't need the VAD and Punc model ? Thank you so much :)

Please read the readme and follow its instruction to download all of them:

For Chinese ASR (additionally), download models from Damo ASR Model, Damo VAD Model, and Damo Punc Model and place them in tools/damo_asr/models.

HowcanoeWang commented 8 months ago

@RVC-Boss 成功定位了问题:

Traceback (most recent call last):
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/transformers/utils/hub.py", line 389, in cached_file
    resolved_file = hf_hub_download(
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 110, in _inner_fn
    validate_repo_id(arg_value)
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 158, in validate_repo_id
    raise HFValidationError(
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'GPT_SoVITS/pretrained_models/chinese-roberta-wwm-ext-large'. Use `repo_type` argument if needed.

原因是,'bert_pretrained_dir': 'GPT_SoVITS/pretrained_models/chinese-roberta-wwm-ext-large', 里面给的是三段的路径,而实际上的路径应该为两段 pretrained_models/chinese-roberta-wwm-ext-large 即可,因此需要删除webui.py 下面的如下代码中路径的value=路径值前面的 GPT_SoVITS

https://github.com/RVC-Boss/GPT-SoVITS/blob/d2c2d4eb34a6dcbd8f0127b212ad4cedd434a2a0/webui.py#L648-L650

https://github.com/RVC-Boss/GPT-SoVITS/blob/d2c2d4eb34a6dcbd8f0127b212ad4cedd434a2a0/webui.py#L664

https://github.com/RVC-Boss/GPT-SoVITS/blob/d2c2d4eb34a6dcbd8f0127b212ad4cedd434a2a0/webui.py#L671

whitescent commented 8 months ago

哥,你这个txt_path="%s/2-name2text-%s.txt"%(opt_dir,i_part)文件的定义是有问题的……麻烦把那些代码改一下,你前面切分gpu_names=gpu_numbers1c.split("-")会给文件带-0,-1,-2,因为webui单卡整的0-0……但是训练的时候还是按照2-name2text.txt之类的来拿……

Does logs/xxx/2-name2text.txt exists?

这个问题我已经发现了下个版本就会改了(现在在一键三连里面会自动合并),但是我不知道楼主遇到是同一个问题么

@RVC-Boss 最新的版本有这样的报错:

Running on local URL:  http://0.0.0.0:9871
"/home/hwang/Applications/miniconda3/envs/sovits/bin/python" GPT_SoVITS/prepare_datasets/1-get-text.py
"/home/hwang/Applications/miniconda3/envs/sovits/bin/python" GPT_SoVITS/prepare_datasets/1-get-text.py
Traceback (most recent call last):
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/transformers/utils/hub.py", line 389, in cached_file
    resolved_file = hf_hub_download(
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 110, in _inner_fn
    validate_repo_id(arg_value)
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 158, in validate_repo_id
    raise HFValidationError(
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'GPT_SoVITS/pretrained_models/chinese-roberta-wwm-ext-large'. Use `repo_type` argument if needed.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/hwang/Applications/GPTSoVITS/GPT_SoVITS/prepare_datasets/1-get-text.py", line 50, in <module>
    tokenizer = AutoTokenizer.from_pretrained(bert_pretrained_dir)
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 737, in from_pretrained
    tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 569, in get_tokenizer_config
    resolved_config_file = cached_file(
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/transformers/utils/hub.py", line 454, in cached_file
    raise EnvironmentError(
OSError: Incorrect path_or_model_id: 'GPT_SoVITS/pretrained_models/chinese-roberta-wwm-ext-large'. Please provide either the path to a local folder or the repo_id of a model on the Hub.
Traceback (most recent call last):
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/transformers/utils/hub.py", line 389, in cached_file
    resolved_file = hf_hub_download(
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 110, in _inner_fn
    validate_repo_id(arg_value)
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 158, in validate_repo_id
    raise HFValidationError(
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'GPT_SoVITS/pretrained_models/chinese-roberta-wwm-ext-large'. Use `repo_type` argument if needed.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/hwang/Applications/GPTSoVITS/GPT_SoVITS/prepare_datasets/1-get-text.py", line 50, in <module>
    tokenizer = AutoTokenizer.from_pretrained(bert_pretrained_dir)
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 737, in from_pretrained
    tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 569, in get_tokenizer_config
    resolved_config_file = cached_file(
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/transformers/utils/hub.py", line 454, in cached_file
    raise EnvironmentError(
OSError: Incorrect path_or_model_id: 'GPT_SoVITS/pretrained_models/chinese-roberta-wwm-ext-large'. Please provide either the path to a local folder or the repo_id of a model on the Hub.
Traceback (most recent call last):
  File "/home/hwang/Applications/GPTSoVITS/webui.py", line 499, in open1abc
    with open(txt_path, "r",encoding="utf8") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'logs/xxx/2-name2text-0.txt'

不确定是不是这个问题引起的?

补充一下页面的参数:

图片

只有一张单卡GPU,感觉是不是没必要切分文本文件?

same encountered this error, is there a workaround?

company8 commented 8 months ago

@whitescent

same encountered this error, is there a workaround?

The "huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'GPT_SoVITS/pretrained_models/chinese-roberta-wwm-ext-large'. Use repo_type argument if needed."

That error I fixed it by placing the huggingface models inside "GPT_SoVITS" folder. So repo name is "GPT-SoVITS" and inside that directory there is a another folder named "GPT_SoVITS" place the huggingface models with configs there.

As for the "FileNotFoundError: [Errno 2] No such file or directory: 'logs/xxx/2-name2text-0.txt'"

That error I think it will fixed after you run the ASR tool. I haven't tried that fix because I was trying to train in the cloud and the models were big and I couldn't install git-lfs, so I gave up and left it for a another day.

RVC-Boss commented 8 months ago

哥,你这个txt_path="%s/2-name2text-%s.txt"%(opt_dir,i_part)文件的定义是有问题的……麻烦把那些代码改一下,你前面切分gpu_names=gpu_numbers1c.split("-")会给文件带-0,-1,-2,因为webui单卡整的0-0……但是训练的时候还是按照2-name2text.txt之类的来拿……

Does logs/xxx/2-name2text.txt exists?

目前已经解决了,自动拼接了。

RVC-Boss commented 8 months ago

哥,你这个txt_path="%s/2-name2text-%s.txt"%(opt_dir,i_part)文件的定义是有问题的……麻烦把那些代码改一下,你前面切分gpu_names=gpu_numbers1c.split("-")会给文件带-0,-1,-2,因为webui单卡整的0-0……但是训练的时候还是按照2-name2text.txt之类的来拿……

Does logs/xxx/2-name2text.txt exists?

这个问题我已经发现了下个版本就会改了(现在在一键三连里面会自动合并),但是我不知道楼主遇到是同一个问题么

还有一个,1Bb-GPT训练时训练集self.semantic_data = pd.read_csv(semantic_path, delimiter='\t', encoding="utf-8")有点问题,我这边用了第一列作为列名,我用的最新的pandas……但是训练代码里面使用‘item_name’来取文件名……怎么都和列名不匹配,应该是前面定义的时候没定义进去吧?所以才会出现后面那个哥们用整合包訓練gpt出问题?

已修复,看看现在是否正常。

RVC-Boss commented 8 months ago

@RVC-Boss 大佬,我用您的整合包訓練gpt出現錯誤

哥们如果你的报错也是KeyError: 'item_name',试试把GPT_SoVITS\AR\data\dataset.py里面init_batch改一下:

class Text2SemanticDataset(Dataset):
    """dataset class for text tokens to semantic model training."""

    def __init__(self,
                 phoneme_path: str,
                 semantic_path: str,
                 max_sample: int = None,
                 max_sec: int = 100,
                 pad_val: int = 1024,
                 # min value of phoneme/sec
                 min_ps_ratio: int = 3,
                 # max value of phoneme/sec
                 max_ps_ratio: int = 25) -> None:
        super().__init__()
        # self.semantic_data = pd.read_csv(semantic_path, delimiter='\t', encoding="utf-8")
        self.semantic_data = pd.read_csv(semantic_path, delimiter='\t', encoding="utf-8", names=['item_name','semantic_audio'])
        # get dict

然后下面init_batch改一下试试:

    def init_batch(self):
        # semantic_data_len = len(self.semantic_data)
        phoneme_data_len = len(self.phoneme_data.keys())
        print("semantic_data_len:", semantic_data_len)
        print("phoneme_data_len:", phoneme_data_len)
        idx = 0
        num_not_in = 0
        num_deleted_bigger = 0
        num_deleted_ps = 0
        # for i in range(semantic_data_len):
        for index,row in self.semantic_data.iterrows():
            # 先依次遍历
            # get str
            # item_name = self.semantic_data['item_name'][i]
            item_name = row['item_name']
            # print(self.phoneme_data)
            try:
                phoneme, word2ph, text = self.phoneme_data[item_name]
            except Exception:
                traceback.print_exc()
                # print(f"{item_name} not in self.phoneme_data !")
                num_not_in += 1
                continue

不是整个函数都整哈,替代同样的到最后一行就行,第一个到# get dict,第二个到continue

已修复,看看现在是否正常。

RVC-Boss commented 8 months ago

哥,你这个txt_path="%s/2-name2text-%s.txt"%(opt_dir,i_part)文件的定义是有问题的……麻烦把那些代码改一下,你前面切分gpu_names=gpu_numbers1c.split("-")会给文件带-0,-1,-2,因为webui单卡整的0-0……但是训练的时候还是按照2-name2text.txt之类的来拿……

Does logs/xxx/2-name2text.txt exists?

这个问题我已经发现了下个版本就会改了(现在在一键三连里面会自动合并),但是我不知道楼主遇到是同一个问题么

@RVC-Boss 最新的版本有这样的报错:

Running on local URL:  http://0.0.0.0:9871
"/home/hwang/Applications/miniconda3/envs/sovits/bin/python" GPT_SoVITS/prepare_datasets/1-get-text.py
"/home/hwang/Applications/miniconda3/envs/sovits/bin/python" GPT_SoVITS/prepare_datasets/1-get-text.py
Traceback (most recent call last):
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/transformers/utils/hub.py", line 389, in cached_file
    resolved_file = hf_hub_download(
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 110, in _inner_fn
    validate_repo_id(arg_value)
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 158, in validate_repo_id
    raise HFValidationError(
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'GPT_SoVITS/pretrained_models/chinese-roberta-wwm-ext-large'. Use `repo_type` argument if needed.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/hwang/Applications/GPTSoVITS/GPT_SoVITS/prepare_datasets/1-get-text.py", line 50, in <module>
    tokenizer = AutoTokenizer.from_pretrained(bert_pretrained_dir)
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 737, in from_pretrained
    tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 569, in get_tokenizer_config
    resolved_config_file = cached_file(
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/transformers/utils/hub.py", line 454, in cached_file
    raise EnvironmentError(
OSError: Incorrect path_or_model_id: 'GPT_SoVITS/pretrained_models/chinese-roberta-wwm-ext-large'. Please provide either the path to a local folder or the repo_id of a model on the Hub.
Traceback (most recent call last):
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/transformers/utils/hub.py", line 389, in cached_file
    resolved_file = hf_hub_download(
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 110, in _inner_fn
    validate_repo_id(arg_value)
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 158, in validate_repo_id
    raise HFValidationError(
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'GPT_SoVITS/pretrained_models/chinese-roberta-wwm-ext-large'. Use `repo_type` argument if needed.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/hwang/Applications/GPTSoVITS/GPT_SoVITS/prepare_datasets/1-get-text.py", line 50, in <module>
    tokenizer = AutoTokenizer.from_pretrained(bert_pretrained_dir)
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 737, in from_pretrained
    tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 569, in get_tokenizer_config
    resolved_config_file = cached_file(
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/transformers/utils/hub.py", line 454, in cached_file
    raise EnvironmentError(
OSError: Incorrect path_or_model_id: 'GPT_SoVITS/pretrained_models/chinese-roberta-wwm-ext-large'. Please provide either the path to a local folder or the repo_id of a model on the Hub.
Traceback (most recent call last):
  File "/home/hwang/Applications/GPTSoVITS/webui.py", line 499, in open1abc
    with open(txt_path, "r",encoding="utf8") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'logs/xxx/2-name2text-0.txt'

不确定是不是这个问题引起的?

补充一下页面的参数:

图片

只有一张单卡GPU,感觉是不是没必要切分文本文件?

已修复。会自动拼接了。

RVC-Boss commented 8 months ago

Hey @HowcanoeWang how did you make the .list file ? The Text annotation file how did create it, did I miss some step because I don't how to create it and no user guide since it's new :D

Use the '中文离线识别ASR' tool to create the list file. But if you are not using that tool, simply create an empty txt file and rename to xxxx.list, and type the following format should also works: (I guess...)

/absolute/file/path/to/each/vocal1.wav||ZH|这边输入对应的文字1。
/absolute/file/path/to/each/vocal2.wav||ZH|这边输入对应的文字2。

Very labor-intensive, strongly recommend the ASR tool 图片

I didn't downloaded the ASR models... So I only need the ASR model ? I don't need the VAD and Punc model ? Thank you so much :)

Fixed now. If you want to do Chinese ASR first and don't donwload the model, now it will download it automatically.

RVC-Boss commented 8 months ago

@RVC-Boss 成功定位了问题:

Traceback (most recent call last):
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/transformers/utils/hub.py", line 389, in cached_file
    resolved_file = hf_hub_download(
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 110, in _inner_fn
    validate_repo_id(arg_value)
  File "/home/hwang/Applications/miniconda3/envs/sovits/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 158, in validate_repo_id
    raise HFValidationError(
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'GPT_SoVITS/pretrained_models/chinese-roberta-wwm-ext-large'. Use `repo_type` argument if needed.

原因是,'bert_pretrained_dir': 'GPT_SoVITS/pretrained_models/chinese-roberta-wwm-ext-large', 里面给的是三段的路径,而实际上的路径应该为两段 pretrained_models/chinese-roberta-wwm-ext-large 即可,因此需要删除webui.py 下面的如下代码中路径的value=路径值前面的 GPT_SoVITS

https://github.com/RVC-Boss/GPT-SoVITS/blob/d2c2d4eb34a6dcbd8f0127b212ad4cedd434a2a0/webui.py#L648-L650

https://github.com/RVC-Boss/GPT-SoVITS/blob/d2c2d4eb34a6dcbd8f0127b212ad4cedd434a2a0/webui.py#L664

https://github.com/RVC-Boss/GPT-SoVITS/blob/d2c2d4eb34a6dcbd8f0127b212ad4cedd434a2a0/webui.py#L671

readme里是让用户下到GPT_SoVITS/pretrained_models下面,不是GPT-SoVITS/pretrained_models下面,因此是你位置下错了。

RVC-Boss commented 8 months ago

@whitescent

same encountered this error, is there a workaround?

The "huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'GPT_SoVITS/pretrained_models/chinese-roberta-wwm-ext-large'. Use repo_type argument if needed."

That error I fixed it by placing the huggingface models inside "GPT_SoVITS" folder. So repo name is "GPT-SoVITS" and inside that directory there is a another folder named "GPT_SoVITS" place the huggingface models with configs there.

As for the "FileNotFoundError: [Errno 2] No such file or directory: 'logs/xxx/2-name2text-0.txt'"

That error I think it will fixed after you run the ASR tool. I haven't tried that fix because I was trying to train in the cloud and the models were big and I couldn't install git-lfs, so I gave up and left it for a another day.

Fixed now. Now it will concat to a whole file automatically.