Plachtaa / VITS-fast-fine-tuning

This repo is a pipeline of VITS finetuning for fast speaker adaptation TTS, and many-to-many voice conversion
Apache License 2.0
4.69k stars 705 forks source link

提示 This model doesn't have language tokens so it can't perform lang id #474

Closed heiheiheibj closed 11 months ago

heiheiheibj commented 11 months ago

python scripts/long_audio_transcribe.py --languages "CJ" --whisper_size large <generator object _walk at 0x7f3b9dec0820> ['gdg_4.wav', 'gdg_1.wav', 'gdg_5.wav', 'gdg_2.wav', 'gdg_8.wav', 'gdg_9.wav', 'gdg_7.wav', 'gdg_6.wav', 'gdg_3.wav'] filelist= ['gdg_4.wav', 'gdg_1.wav', 'gdg_5.wav', 'gdg_2.wav', 'gdg_8.wav', 'gdg_9.wav', 'gdg_7.wav', 'gdg_6.wav', 'gdg_3.wav'] transcribing ./denoised_audio/gdg_4.wav...

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Traceback (most recent call last): File "/home/ubuntu/VITS-fast-fine-tuning/scripts/long_audio_transcribe.py", line 45, in result = model.transcribe(parent_dir + file, word_timestamps=True, *transcribeoptions) File "/root/anaconda3/envs/barkvoice/lib/python3.10/site-packages/whisper/transcribe.py", line 130, in transcribe , probs = model.detect_language(mel_segment) File "/root/anaconda3/envs/barkvoice/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, **kwargs) File "/root/anaconda3/envs/barkvoice/lib/python3.10/site-packages/whisper/decoding.py", line 40, in detect_language raise ValueError( ValueError: This model doesn't have language tokens so it can't perform lang id

已经下载过 wget https://huggingface.co/spaces/sayashi/vits-uma-genshin-honkai/resolve/main/model/D_0-p.pth -O ./pretrained_models/D_0.pth wget https://huggingface.co/spaces/sayashi/vits-uma-genshin-honkai/resolve/main/model/G_0-p.pth -O ./pretrained_models/G_0.pth wget https://huggingface.co/spaces/sayashi/vits-uma-genshin-honkai/resolve/main/model/config.json -O ./configs/finetune_speaker.json

以前是可跑的,重装了UBUNTU 22.04就出现这问题了。谢谢

Json0926 commented 11 months ago

解决了吗

heiheiheibj commented 11 months ago

解决了吗

没有。。。模型也重下了,一个样

mikeyang01 commented 11 months ago

我出现过类似问题, 发现pypi的whisper版本太老, 需要从github下载最新的whisper

heiheiheibj commented 11 months ago

我出现过类似问题, 发现pypi的whisper版本太老, 需要从github下载最新的whisper

感谢!从GIT上下whisper装上后好使了 装whisper完后还有点小插曲,和torchaudio、torchvision 的新版本有冲突,降级到如下情况好使了。再次感谢 torch 2.0.1 torchaudio 2.0.1 torchvision 0.15.2 triton 2.0.0

Json0926 commented 11 months ago

强的 pip install git+https://github.com/openai/whisper.git