Open Tony-Starkus opened 4 months ago
Hi, Make sure you have torchaudio installed properly, with its dependencies to work, or use a vocoder like hifigan or istft-based vocoders like vocos, vocoders are better than griffinlim, honestly.
Hi @rmcpantoja , thanks for the reply.
About the torchaudio, the requirements.txt
has torch>=1.2.0
and torchaudio==2.0.2
. The torchaudio 2 is compatible with pytorch 2. This is why i installed torch==2.0.1
My objective is convert text to audio file, and looking on the gen_forward.py
the griffinlim is the one that created a wav file.
Do you know another way to do it? I tried many codes to convert .mel and .npy to wav but no success.
Reference: https://github.com/pytorch/audio/releases/tag/v2.0.2
Hi @rmcpantoja , thanks for the reply.
About the torchaudio, the
requirements.txt
hastorch>=1.2.0
andtorchaudio==2.0.2
. The torchaudio 2 is compatible with pytorch 2. This is why i installedtorch==2.0.1
My objective is convert text to audio file, and looking on the
gen_forward.py
the griffinlim is the one that created a wav file. Do you know another way to do it? I tried many codes to convert .mel and .npy to wav but no success.Reference: https://github.com/pytorch/audio/releases/tag/v2.0.2
Hi, If you add hifigan to gen_forward's command line, the script will convert npy automatically, and you need to pass the npy to any vocoder. But, I have a script that synthesizes ForwardTacotron and HiFi-GAN at same time, directly, without passing files. We have also a GUI app supporting this TTS, see here
I checked the code of tts-remix. Can you give a little explanation about how to use it?!
Hey, I had the same issue. Fixed it with two lines on gen_forward.py. I created a PR about it.
I checked the code of tts-remix. Can you give a little explanation about how to use it?!
Hi, Just use the GUI using:
python tts_remix.py
The interphase will open. Just you need to put ForwardTacotron and HiFiGan checkpoints, something like: models models/forward models/forward/voicename models/forward/voicename/voicename.pt models/forward/voicename/vocoder-voicename.pt models/forward/voicename/vocoder-voicename.json
Hey, I had the same issue. Fixed it with two lines on gen_forward.py. I created a PR about it.
Looks good, i am going to try it later, thanks!
Which python version are you using? Also can you share your pip freeze please?!
I checked the code of tts-remix. Can you give a little explanation about how to use it?!
Hi, Just use the GUI using:
python tts_remix.py
The interphase will open. Just you need to put ForwardTacotron and HiFiGan checkpoints, something like: models models/forward models/forward/voicename models/forward/voicename/voicename.pt models/forward/voicename/vocoder-voicename.pt models/forward/voicename/vocoder-voicename.json
Got it, i will try this. Thanks!
Hey, I had the same issue. Fixed it with two lines on gen_forward.py. I created a PR about it.
Looks good, i am going to try it later, thanks!
Which python version are you using? Also can you share your pip freeze please?!
Python 3.10 as you. We have some slight differences in pip freeze
but they shouldn't matter.
absl-py==2.1.0
attrs==23.2.0
audioread==3.0.1
Babel==2.15.0
bibtexparser==2.0.0b7
certifi==2024.7.4
cffi==1.16.0
charset-normalizer==3.3.2
clldutils==3.22.2
cmake==3.30.1
colorama==0.4.6
colorlog==6.8.2
contourpy==1.2.1
csvw==3.3.0
cycler==0.12.1
Cython==3.0.10
dataclasses==0.6
decorator==5.1.1
dlinfo==1.2.1
filelock==3.15.4
fonttools==4.53.1
grpcio==1.65.1
idna==3.7
inflect==7.3.1
isodate==0.6.1
Jinja2==3.1.4
joblib==1.4.2
jsonschema==4.23.0
jsonschema-specifications==2023.12.1
kiwisolver==1.4.5
language-tags==1.2.0
lazy_loader==0.4
librosa==0.10.0
lit==18.1.8
llvmlite==0.39.1
lxml==5.2.2
Markdown==3.6
MarkupSafe==2.1.5
matplotlib==3.9.1
more-itertools==10.3.0
mpmath==1.3.0
msgpack==1.0.8
networkx==3.3
numba==0.56.4
numpy==1.23.5
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-cupti-cu11==11.7.101
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
nvidia-cufft-cu11==10.9.0.58
nvidia-curand-cu11==10.2.10.91
nvidia-cusolver-cu11==11.4.0.1
nvidia-cusparse-cu11==11.7.4.91
nvidia-nccl-cu11==2.14.3
nvidia-nvtx-cu11==11.7.91
packaging==24.1
pandas==2.2.2
phonemizer==3.2.1
pillow==10.4.0
platformdirs==4.2.2
pooch==1.8.2
protobuf==4.25.4
pycparser==2.22
pylatexenc==2.10
pyparsing==3.1.2
python-dateutil==2.9.0.post0
pytz==2024.1
pyworld==0.3.4
PyYAML==6.0.1
rdflib==7.0.0
referencing==0.35.1
regex==2024.7.24
requests==2.32.3
Resemblyzer==0.1.3
rfc3986==1.5.0
rpds-py==0.19.1
scikit-learn==1.5.1
scipy==1.14.0
segments==2.2.1
six==1.16.0
soundfile==0.12.1
soxr==0.4.0
sympy==1.13.1
tabulate==0.9.0
tensorboard==2.17.0
tensorboard-data-server==0.7.2
threadpoolctl==3.5.0
torch==2.0.1
torchaudio==2.0.2
tqdm==4.66.4
triton==2.0.0
typeguard==4.3.0
typing==3.7.4.3
typing_extensions==4.12.2
tzdata==2024.1
Unidecode==1.3.8
uritemplate==4.1.1
urllib3==2.2.2
webrtcvad==2.0.10
Werkzeug==3.0.3
Hello. I downloaded the pretrained modal
ljspeech v3.1
and when I try to runpython gen_forward.py --alpha 1 --checkpoint pretrained-forward_step90k.pt --input_text 'this is whatever you want it to be' griffinlim
I get the following error:Someone can help me?
I am runing Python 3.10 with following packages versions: