enhuiz / vall-e

An unofficial PyTorch implementation of the audio LM VALL-E
MIT License
2.95k stars 417 forks source link

can not run python3 -m vall_e.train yaml=config/test/nar.yml #81

Open samual30000 opened 1 year ago

samual30000 commented 1 year ago

python3 -m vall_e.train yaml=config/test/nar.yml --debug

跑这个的时候报错了.chatgpt4 说是有可能是原始文件的问题但是又没法给出具体的建议.只能问作者了.

trainer.train(

File "/sam/vall-e/vall_e/utils/trainer.py", line 150, in train for batch in _make_infinite_epochs(train_dl): File "/sam/vall-e/vall_e/utils/trainer.py", line 103, in _make_infinite_epochs yield from dl File "/usr/local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 634, in next data = self._next_data() File "/usr/local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1346, in _next_data return self._process_data(data) File "/usr/local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1372, in _process_data data.reraise() Loaded tensor from /sam/vall-e/data/train/one.qnt.pt with shape: torch.Size([]) File "/usr/local/lib/python3.10/site-packages/torch/_utils.py", line 644, in reraise Added tensor with shape: torch.Size([]) Converted path: /sam/vall-e/data/train/one.qnt.pt -> /sam/vall-e/data/train/one.qnt.pt raise exception RuntimeError: Caught RuntimeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/usr/local/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop data = fetcher.fetch(index) File "/usr/local/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/usr/local/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/sam/vall-e/vall_e/data.py", line 185, in getitem proms = self.sample_prompts(spkr_name, ignore=path) File "/sam/vall-e/vall_e/data.py", line 172, in sample_prompts raise RuntimeError("All tensors in prom_list are zero-dimensional.") RuntimeError: All tensors in prom_list are zero-dimensional.

Loaded tensor from /sam/vall-e/data/train/one.qnt.pt with shape: torch.Size([]) Added tensor with shape: torch.Size([]) Converted path: /sam/vall-e/data/train/one.qnt.pt -> /sam/vall-e/data/train/one.qnt.pt Loaded tensor from /sam/vall-e/data/train/one.qnt.pt with shape: torch.Size([]) Added tensor with shape: torch.Size([])

chatgpt4 帮我写的程序: root@CH-202203180108:/sam/vall-e/data# cat 1.py

import torch

train_qnt = torch.load('/sam/vall-e/data/train/one.qnt.pt') print("Train qnt shape:", train_qnt.shape)

val_qnt = torch.load('/sam/vall-e/data/val/test.qnt.pt') print("Val qnt shape:", val_qnt.shape)

root@CH-202203180108:/sam/vall-e/data# python3 1.py Train qnt shape: torch.Size([3]) Val qnt shape: torch.Size([1, 8, 149])

data的目录结构: root@CH-202203180108:/sam/vall-e/data# ll total 24 drwxr-xr-x 5 root root 4096 Mar 30 21:07 ./ drwxr-xr-x 8 root root 4096 Mar 30 23:45 ../ -rw-r--r-- 1 root root 216 Mar 30 21:07 1.py drwxr-xr-x 2 root root 4096 Mar 28 14:27 test/ drwxr-xr-x 2 root root 4096 Mar 30 23:34 train/ drwxr-xr-x 2 root root 4096 Mar 28 14:55 val/

train目录文件:

root@CH-202203180108:/sam/vall-e/data# ll train/ total 408 drwxr-xr-x 2 root root 4096 Mar 30 23:34 ./ drwxr-xr-x 5 root root 4096 Mar 30 21:07 ../ -rw-r--r-- 1 root root 159 Mar 28 14:53 1.py -rw-r--r-- 1 root root 37 Mar 28 14:49 one.phn.txt -rw-r--r-- 1 root root 747 Mar 28 14:54 one.qnt.pt -rw-r--r-- 1 root root 26 Mar 28 14:38 test.phn.txt -rw-r--r-- 1 root root 10286 Mar 28 14:38 test.qnt.pt -rw-r--r-- 1 root root 380750 Mar 30 23:34 test.wav root@CH-202203180108:/sam/vall-e/data#

报错了不知道怎么搞

Xiangbj17 commented 1 year ago

gpt给你的建议是对的 经过Encodec编码的.pt文件维度都是[1, 8, time_step] /sam/vall-e/data/train/one.qnt.pt 只有一个维度,不太对,检查一下你用qnt编码的过程是不是出了什么问题

samual30000 commented 1 year ago

gpt给你的建议是对的 经过Encodec编码的.pt文件维度都是[1, 8, time_step] /sam/vall-e/data/train/one.qnt.pt 只有一个维度,不太对,检查一下你用qnt编码的过程是不是出了什么问题

你能跑起来了吗,经过了跟gpt4的折腾和调试之后还是没办法,项目是不是有一些训练的数据没有提供 还是确了什么东西啊,就是到了 python3 -m vall_e.train yaml=config/test/nar.yml --debug 这一步就怎么样都跑不起来了

samual30000 commented 1 year ago

gpt给你的建议是对的 经过Encodec编码的.pt文件维度都是[1, 8, time_step] /sam/vall-e/data/train/one.qnt.pt 只有一个维度,不太对,检查一下你用qnt编码的过程是不是出了什么问题

是缺了什么东西了吗

samual30000 commented 1 year ago

'NoneType' object has no attribute 'optimizer_name', self._config is a nonetype

ilanshib commented 1 year ago

encountered same problem. vall_e.train stopped working. At first look it seems that a change was applied to microsoft's DeepSpeed code. when Micorosoft's module is initialized it looks for a config object that contains the attribute optimizer_name.

vall_e uses DeepSpeed and initializes it as part of the class 'Engine' in utils/engines.py but it does not pass the required config parameter. I am not familiar with this code but I could see that other classes in utils/engines.py (e.g. the 'Engines' class) do use a config object that probably has the necessary information.

Can anyone help?

Xiangbj17 commented 1 year ago

gpt给你的建议是对的 经过Encodec编码的.pt文件维度都是[1, 8, time_step] /sam/vall-e/data/train/one.qnt.pt 只有一个维度,不太对,检查一下你用qnt编码的过程是不是出了什么问题

你能跑起来了吗,经过了跟gpt4的折腾和调试之后还是没办法,项目是不是有一些训练的数据没有提供 还是确了什么东西啊,就是到了 python3 -m vall_e.train yaml=config/test/nar.yml --debug 这一步就怎么样都跑不起来了

我能正常跑诶,我感觉是one.qnt.pt的维度有问题,你可以尝试一下把one相关的pt和txt都删掉,只用自带的test.pt和txt跑跑看,看会不会报错。如果可以正常跑的话就能证明是Encodec对one.wav编码的时候出点问题,你重新编码试试,看看能不能得到[1, 8, x]维度的pt.

ilanshib commented 1 year ago

'NoneType' object has no attribute 'optimizer_name', self._config is a nonetype

See the discussion here: https://github.com/enhuiz/vall-e/issues/87

samual30000 commented 1 year ago

'NoneType' object has no attribute 'optimizer_name', self._config is a nonetype

See the discussion here: #87

thanks

samual30000 commented 1 year ago

gpt给你的建议是对的 经过Encodec编码的.pt文件维度都是[1, 8, time_step] /sam/vall-e/data/train/one.qnt.pt 只有一个维度,不太对,检查一下你用qnt编码的过程是不是出了什么问题

你能跑起来了吗,经过了跟gpt4的折腾和调试之后还是没办法,项目是不是有一些训练的数据没有提供 还是确了什么东西啊,就是到了 python3 -m vall_e.train yaml=config/test/nar.yml --debug 这一步就怎么样都跑不起来了

我能正常跑诶,我感觉是one.qnt.pt的维度有问题,你可以尝试一下把one相关的pt和txt都删掉,只用自带的test.pt和txt跑跑看,看会不会报错。如果可以正常跑的话就能证明是Encodec对one.wav编码的时候出点问题,你重新编码试试,看看能不能得到[1, 8, x]维度的pt.

gpt给你的建议是对的 经过Encodec编码的.pt文件维度都是[1, 8, time_step] /sam/vall-e/data/train/one.qnt.pt 只有一个维度,不太对,检查一下你用qnt编码的过程是不是出了什么问题

你能跑起来了吗,经过了跟gpt4的折腾和调试之后还是没办法,项目是不是有一些训练的数据没有提供 还是确了什么东西啊,就是到了 python3 -m vall_e.train yaml=config/test/nar.yml --debug 这一步就怎么样都跑不起来了

我能正常跑诶,我感觉是one.qnt.pt的维度有问题,你可以尝试一下把one相关的pt和txt都删掉,只用自带的test.pt和txt跑跑看,看会不会报错。如果可以正常跑的话就能证明是Encodec对one.wav编码的时候出点问题,你重新编码试试,看看能不能得到[1, 8, x]维度的pt.

thx

kgasenzer commented 1 year ago

encountered same problem. vall_e.train stopped working. At first look it seems that a change was applied to microsoft's DeepSpeed code. when Micorosoft's module is initialized it looks for a config object that contains the attribute optimizer_name.

vall_e uses DeepSpeed and initializes it as part of the class 'Engine' in utils/engines.py but it does not pass the required config parameter. I am not familiar with this code but I could see that other classes in utils/engines.py (e.g. the 'Engines' class) do use a config object that probably has the necessary information.

Can anyone help?

I opened a pull request that deals with this issue. Make sure to have mpi4py installed correctly, as I utilize the default initialization of distributed training which might search for mpis.

samual30000 commented 1 year ago

!pip install deepspeed==0.8.3 make it alright

tangzhimiao commented 1 year ago

牛皮 thx,解决了train的问题

!pip install deepspeed==0.8.3 make it alright