YYuX-1145 / Bert-VITS2-Integration-package

vits2 backbone with bert
https://www.bilibili.com/video/BV13p4y1d7v9
GNU Affero General Public License v3.0
334 stars 30 forks source link

中文特化版生成bert卡死 #73

Closed DoshideDK closed 8 months ago

DoshideDK commented 8 months ago

跑通过一次完整的流程了,包括训练推理。新建另一个实验的时候,到生成bert文件会内存占用吃满,然后卡死。一般是直接卡死,偶尔能看到报错信息 0%|

| 0/35 [00:34<?, ?it/s] multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "D:\Program Files\Bert-VITS2-Extra\bert_gen.py", line 41, in process_line bert = torch.load(bert_path) File "D:\Program Files\Bert-VITS2-Extra\venv\lib\site-packages\torch\serialization.py", line 986, in load with _open_file_like(f, 'rb') as opened_file: File "D:\Program Files\Bert-VITS2-Extra\venv\lib\site-packages\torch\serialization.py", line 435, in _open_file_like return _open_file(name_or_buffer, mode) File "D:\Program Files\Bert-VITS2-Extra\venv\lib\site-packages\torch\serialization.py", line 416, in init super().init(open(name, mode)) FileNotFoundError: [Errno 2] No such file or directory: './Data/rin/wavs/rin/processed_rin_17.bert.pt' During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "D:\Program Files\Bert-VITS2-Extra\venv\lib\multiprocessing\pool.py", line 125, in worker result = (True, func(*args, *kwds)) File "D:\Program Files\Bert-VITS2-Extra\bert_gen.py", line 44, in process_line bert = get_bert(text, word2ph, language_str, device) File "D:\Program Files\Bert-VITS2-Extra\text__init__.py", line 23, in get_bert bert = zh_bert( File "D:\Program Files\Bert-VITS2-Extra\text\chinese_bert.py", line 27, in get_bert_feature models[device] = MegatronBertModel.from_pretrained(LOCAL_PATH).to(device) File "D:\Program Files\Bert-VITS2-Extra\venv\lib\site-packages\transformers\modeling_utils.py", line 2014, in to return super().to(args, **kwargs) File "D:\Program Files\Bert-VITS2-Extra\venv\lib\site-packages\torch\nn\modules\module.py", line 1160, in to return self._apply(convert) File "D:\Program Files\Bert-VITS2-Extra\venv\lib\site-packages\torch\nn\modules\module.py", line 810, in _apply module._apply(fn) File "D:\Program Files\Bert-VITS2-Extra\venv\lib\site-packages\torch\nn\modules\module.py", line 810, in _apply module._apply(fn) File "D:\Program Files\Bert-VITS2-Extra\venv\lib\site-packages\torch\nn\modules\module.py", line 833, in _apply param_applied = fn(param) File "D:\Program Files\Bert-VITS2-Extra\venv\lib\site-packages\torch\nn\modules\module.py", line 1158, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) RuntimeError: CUDA error: CUDA-capable device(s) is/are busy or unavailable CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "bertgen.py", line 73, in for in tqdm( File "D:\Program Files\Bert-VITS2-Extra\venv\lib\site-packages\tqdm\std.py", line 1178, in iter for obj in iterable: File "D:\Program Files\Bert-VITS2-Extra\venv\lib\multiprocessing\pool.py", line 868, in next raise value RuntimeError: CUDA error: CUDA-capable device(s) is/are busy or unavailable CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

DoshideDK commented 8 months ago

重启电脑,重装cuda都没用

YYuX-1145 commented 8 months ago

大概电脑炸了,显存或内存不行的话尝试把bert_gen进程数降低至1

DoshideDK commented 8 months ago

大概电脑炸了,显存或内存不行的话尝试把bert_gen进程数降低至1

等了一会,如果没炸就可以生成了。看起来是变成cpu跑了,正常是gpu跑对吗?

YYuX-1145 commented 8 months ago

生成时进度条会往前走的,cpu跑需要手动设置,否则还是gpu跑

DoshideDK commented 8 months ago

生成时进度条会往前走的,cpu跑需要手动设置,否则还是gpu跑

我是进度条卡0一分钟左右,然后一秒跑完。请问一下cpu跑的配置在哪,我检查一下是否误触

YYuX-1145 commented 8 months ago

config.yml里,看一下有没有生成bert,生成了就行

DoshideDK commented 8 months ago

config.yml里,看一下有没有生成bert,生成了就行

有的,感谢大佬

DoshideDK commented 8 months ago

17%|██████████████▍ | 4/23 [01:48<08:34, 27.08s/it] multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "D:\Program Files\Bert-VITS2-Extra\bert_gen.py", line 41, in process_line bert = torch.load(bert_path) File "D:\Program Files\Bert-VITS2-Extra\venv\lib\site-packages\torch\serialization.py", line 986, in load with _open_file_like(f, 'rb') as opened_file: File "D:\Program Files\Bert-VITS2-Extra\venv\lib\site-packages\torch\serialization.py", line 435, in _open_file_like return _open_file(name_or_buffer, mode) File "D:\Program Files\Bert-VITS2-Extra\venv\lib\site-packages\torch\serialization.py", line 416, in init super().init(open(name, mode)) FileNotFoundError: [Errno 2] No such file or directory: './Data/test/wavs/test/processed_test_9.bert.pt'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "D:\Program Files\Bert-VITS2-Extra\venv\lib\multiprocessing\pool.py", line 125, in worker result = (True, func(*args, **kwds)) File "D:\Program Files\Bert-VITS2-Extra\bert_gen.py", line 44, in process_line bert = get_bert(text, word2ph, language_str, device) File "D:\Program Files\Bert-VITS2-Extra\text__init__.py", line 23, in get_bert bert = zh_bert( File "D:\Program Files\Bert-VITS2-Extra\text\chinese_bert.py", line 41, in get_bert_feature assert len(word2ph) == len(text) + 2 AssertionError """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "bertgen.py", line 73, in for in tqdm( File "D:\Program Files\Bert-VITS2-Extra\venv\lib\site-packages\tqdm\std.py", line 1178, in iter for obj in iterable: File "D:\Program Files\Bert-VITS2-Extra\venv\lib\multiprocessing\pool.py", line 868, in next raise value AssertionError

D:\Program Files\Bert-VITS2-Extra>

DoshideDK commented 8 months ago

好像又有新报错了,大佬能帮忙看看吗

YYuX-1145 commented 8 months ago

extra-v2?没用过,大概某些部分例如g2p没替换会导致此问题吧?