Closed Bubarinokk closed 2 days ago
will this block the inference? if just warning and inference goes well, it is normal case
yes, it is blocking
Could you provide more infos? e.g. a full screenshot of command line output The info here
warnings.warn( You have passed task=transcribe, but also have set forced_decoder_ids to [[1, None], [2, 50360]] which creates a conflict. forced_decoder_ids will be ignored in favor of
will not cause error
I've encountered the same issue and have attempted every suggested solution, but none have been effective.
To create a public link, set
share=Truein
launch(). C:\Users\+++-\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\whisper\generation_whisper.py:509: FutureWarning: The input name
inputsis deprecated. Please make sure to use
input_featuresinstead. warnings.warn( You have passed task=transcribe, but also have set
forced_decoder_idsto [[1, None], [2, 50360]] which creates a conflict.
forced_decoder_ids` will be ignored in favor of task=transcribe.
`
I have the same problem. I'm not sure if the warning itself is blocking inference, but inference is indeed blocked. Running on Windows 11
D:\github\F5-TTS\venv\lib\site-packages\transformers\models\whisper\generation_whisper.py:509: FutureWarning: The input name `inputs` is deprecated. Please make sure to use `input_features` instead.
warnings.warn(
You have passed task=transcribe, but also have set `forced_decoder_ids` to [[1, None], [2, 50360]] which creates a conflict. `forced_decoder_ids` will be ignored in favor of task=transcribe.
EDIT: I tried the cli. It works. Looks like problem lies in the Gradio interface.
I have the same problem. I'm not sure if the warning itself is blocking inference, but inference is indeed blocked. Running on Windows 11
D:\github\F5-TTS\venv\lib\site-packages\transformers\models\whisper\generation_whisper.py:509: FutureWarning: The input name `inputs` is deprecated. Please make sure to use `input_features` instead. warnings.warn( You have passed task=transcribe, but also have set `forced_decoder_ids` to [[1, None], [2, 50360]] which creates a conflict. `forced_decoder_ids` will be ignored in favor of task=transcribe.
EDIT: I tried the cli. It works. Looks like problem lies in the Gradio interface.
any solution?
TL;DR: If you're running it on Windows in the UI, always provide a reference text in the Advanced Settings, to make it run.
OK. I found the issue. It lies in the transcription.
In this line: https://github.com/SWivid/F5-TTS/blob/b0f482421b03e187ee7ca1893458f383e2c289d3/src/f5_tts/infer/utils_infer.py#L126
I changed whisper-large-v3-turbo to whisper-base, and I was able to get the result after a while. I suspect that when it was whisper-large-v3-turbo, it just took way longer because it's a larger model. So when we see it "stuck", it's actually running the transcription. It just was taking too much time.
Somehow when running in cli this isn't an issue, but when running in gradio it's extremely slow.
Even when running the whisper-base, it's still very slow, compared to what it should be (on a 4090 GPU)
Potentially some conflicts exist between gradio and the huggingface pipeline libraries on Windows? Windows is a weird system.
I have the same problem. I'm not sure if the warning itself is blocking inference, but inference is indeed blocked. Running on Windows 11
D:\github\F5-TTS\venv\lib\site-packages\transformers\models\whisper\generation_whisper.py:509: FutureWarning: The input name `inputs` is deprecated. Please make sure to use `input_features` instead. warnings.warn( You have passed task=transcribe, but also have set `forced_decoder_ids` to [[1, None], [2, 50360]] which creates a conflict. `forced_decoder_ids` will be ignored in favor of task=transcribe.
EDIT: I tried the cli. It works. Looks like problem lies in the Gradio interface.
any solution?
Based on current infos, it seems more like a network issue. The inference is blocked cuz the fetching process for openai/whisper-large-v3-turbo is stuck, if ctrl-c when stuck will probably see something like "connect() xxxxx" which means the process is stuck at there.
some possible solutions:
export HF_ENDPOINT=https://hf-mirror.com
C:\Users\YOURUSERNAME\.cache\huggingface\hub\models--openai--whisper-large-v3-turbo
, check for some online tutorials regarding how to use local checkpoint for huggingface modelsBased on current infos, it seems more like a network issue. The inference is blocked cuz the fetching process for openai/whisper-large-v3-turbo is stuck, if ctrl-c when stuck will probably see something like "connect() xxxxx" which means the process is stuck at there.
some possible solutions:
- leverage a vpn
- set in command line environment
export HF_ENDPOINT=https://hf-mirror.com
- manually download whisper model and place under
C:\Users\YOURUSERNAME\.cache\huggingface\hub\models--openai--whisper-large-v3-turbo
, check for some online tutorials regarding how to use local checkpoint for huggingface models
Nope. I saw the large model fully downloaded. And when it was running, I could hear my GPU humming (the same way it hums when running whisper-base)
@omnific9 if comment out this line would help? https://github.com/SWivid/F5-TTS/blob/f7e248e2ced0f1bc6885093d29893a1e4463bc71/src/f5_tts/infer/utils_infer.py#L127 or it's probably still some problems with the pipeline, no idea then 😔
@omnific9 if comment out this line would help?
or it's probably still some problems with the pipeline, no idea then 😔
Huh... that worked. So whisper-large doesn't work with float16? Or is this only a problem on Windows?
So whisper-large doesn't work with float16? Or is this only a problem on Windows?
maybe gpu? what gpu device are you using?
So whisper-large doesn't work with float16? Or is this only a problem on Windows?
maybe gpu? what gpu device are you using?
RTX 4090
RTX 4090
rtx4090 definitely support fp16, then it's probably a problem with platform (window/linux, torch cuda versions, transformers pipeline), dunno, which is not clear based on current info
I'm also having the same issue, I'm sure the model: whisper-large-v3-turbo has been fully downloaded to the local.
so Mr. @SWivid , what the fix for my issue ... thanks my issue is the audio output without content, silent.. tried to use VPN and still didn't worked ..
@omnific9 if comment out this line would help?
or it's probably still some problems with the pipeline, no idea then 😔
@armthug213 have you tried this?
And what torch and cuda version are you using (this might help figure out a global solution addressing the issue)
do pip show torch
and nvcc -V
@omnific9 if comment out this line would help? https://github.com/SWivid/F5-TTS/blob/f7e248e2ced0f1bc6885093d29893a1e4463bc71/src/f5_tts/infer/utils_infer.py#L127
or it's probably still some problems with the pipeline, no idea then 😔
@armthug213 have you tried this? @SWivid not yet
And what torch and cuda version are you using (this might help figure out a global solution addressing the issue) do
pip show torch
andnvcc -V
here
@armthug213 the way of comment out fp16 for pipeline probably work. And thanks for providing torch cuda version info, seems all right.
One last thing I could think of is transformers pkg version, for me 4.39.3
on linux works well and 4.45.2
also fine for windows.
If that doesn't change anything, I'm like at a loss what's is going wrong there cuz cannot reproduce this failure from my side.
@SWivid @omnific9 just did tht and still no change same empty output !!
One last thing I could think of is transformers pkg version, for me
4.39.3
on linux works well and4.45.2
also fine for windows. If that doesn't change anything, I'm like at a loss what's is going wrong there cuz cannot reproduce this failure from my side.
@SWivid so how i would tackle this transformers pkg? can you walk me thru the process thanks
@armthug213 so how is the command line output after you comment out torch_dtype=torch.float16
?
If you end at all failures with asr pipeline, I would suggest you pass in a ref_text rather than using asr transcription (cuz it is not clear how your case can be reproduced).
@armthug213 so how is the command line output after you comment out
torch_dtype=torch.float16
?
@omnific9 did it worked for you? if yes , what the fix was and could you walk me thru the fix process I'm not that techie .. i appreciate it thanks
Update : I downloaded whisper_target_v3_turbo and the program worked for me, but it is very slow. It took more than 10 minutes for approximately 8 words. When I checked the task, I found it was using the CPU, not the GPU! Can the GPU be specified?
I have an Intel Xe GPU and an Intel Arc A350M. The processor is i9, Win11.
TL;DR: If you're running it on Windows in the UI, always provide a reference text in the Advanced Settings, to make it run.
OK. I found the issue. It lies in the transcription.
In this line: https://github.com/SWivid/F5-TTS/blob/b0f482421b03e187ee7ca1893458f383e2c289d3/src/f5_tts/infer/utils_infer.py#L126
I changed whisper-large-v3-turbo to whisper-base, and I was able to get the result after a while. I suspect that when it was whisper-large-v3-turbo, it just took way longer because it's a larger model. So when we see it "stuck", it's actually running the transcription. It just was taking too much time.
Somehow when running in cli this isn't an issue, but when running in gradio it's extremely slow.
Even when running the whisper-base, it's still very slow, compared to what it should be (on a 4090 GPU)
Potentially some conflicts exist between gradio and the huggingface pipeline libraries on Windows? Windows is a weird system.
I have the same problem. I'm not sure if the warning itself is blocking inference, but inference is indeed blocked. Running on Windows 11
D:\github\F5-TTS\venv\lib\site-packages\transformers\models\whisper\generation_whisper.py:509: FutureWarning: The input name `inputs` is deprecated. Please make sure to use `input_features` instead. warnings.warn( You have passed task=transcribe, but also have set `forced_decoder_ids` to [[1, None], [2, 50360]] which creates a conflict. `forced_decoder_ids` will be ignored in favor of task=transcribe.
EDIT: I tried the cli. It works. Looks like problem lies in the Gradio interface.
any solution?
@medoderi gpustat
(need pip install gpustat) or nvidia-smi
see the rank
then CUDA_VISIBLE_DEVICES=0 f5-tts_infer-gradio
if rank0
A silent output audio too. It shows "Using custom reference text", and the cuda is running. I guess it's not about whisper or GPU
A silent output audio。The methods above don't seem to work for me, whether in Gradio or cli. Can anyone offer some help?thanks.. @SWivid
(.venv) D:\TTS\F5-TTS>f5-tts_infer-cli Download Vocos from huggingface charactr/vocos-mel-24khz Using F5-TTS...
vocab : D:\TTS\F5-TTS.venv\lib\site-packages\f5_tts\infer\examples\vocab.txt tokenizer : custom model : C:\Users\Dell.cache\huggingface\hub\models--SWivid--F5-TTS\snapshots\995ff41929c08ff968786b448a384330438b5cb6\F5TTS_Base\model_1200000.safetensors
Converting audio... Using custom reference text... Voice: main Ref_audio: C:\Users\Dell\AppData\Local\Temp\tmp0xdgir1v.wav Ref_text: Some call me nature, others call me mother nature. No voice tag found, using main. Voice: main gen_text 0 I don't really care what you call me. I've been a silent spectator, watching species evolve, empires rise and fall. But always remember, I am mighty and enduring. Generating audio in 1 batches... 0%| | 0/1 [00:00<?, ?it/s]Building prefix dict from the default dictionary ... Loading model from cache C:\Users\Dell\AppData\Local\Temp\jieba.cache Loading model cost 1.277 seconds. Prefix dict has been built successfully. D:\TTS\F5-TTS.venv\lib\site-packages\f5_tts\model\modules.py:436: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.) x = F.scaled_dot_product_attention(query, key, value, attn_mask=attn_mask, dropout_p=0.0, is_causal=False) 100%|███████████████████████████████████████████████████████████████████████████████████| 1/1 [02:10<00:00, 130.72s/it] tests\infer_cli_out.wav
(.venv) D:\TTS\F5-TTS>pip show torch Name: torch Version: 2.3.0+cu118 Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration Home-page: https://pytorch.org/ Author: PyTorch Team Author-email: packages@pytorch.org License: BSD-3 Location: d:\tts\f5-tts.venv\lib\site-packages Requires: filelock, fsspec, jinja2, mkl, networkx, sympy, typing-extensions Required-by: accelerate, bitsandbytes, ema-pytorch, encodec, f5-tts, torchaudio, torchdiffeq, vocos, x-transformers
(.venv) D:\TTS\F5-TTS>nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Wed_Sep_21_10:41:10_Pacific_Daylight_Time_2022 Cuda compilation tools, release 11.8, V11.8.89 Build cuda_11.8.r11.8/compiler.31833905_0
tests\infer_cli_out.wav is a silent file....
@hongyu2024 https://github.com/SWivid/F5-TTS/blob/61ff2a62d9487e3362ffa5680007e788ad764065/src/f5_tts/infer/utils_infer.py#L324
change to audio, sr = torchaudio.load(ref_audio, backend="soundfile")
check if works
@281807424 @hongyu2024 @Bubarinokk @omnific9
has any one of you installed ComfyUI Locally ? kindly let me know im doing some investigation
@281807424 @hongyu2024 @Bubarinokk @omnific9
has any one of you installed ComfyUI Locally ? kindly let me know im doing some investigation
no install
did a search on my Laptop of the term "whisper" and found this two folders, it's should be 2 folders ? BTW I installed ComfyUI Locally 2 months ago.
@SWivid @omnific9
@armthug213 does ComfyUI do anything related to this repo or this issue? and you are searching for the term "whisper" for what purpose?
@armthug213 does ComfyUI do anything related to this repo or this issue? and you are searching for the term "whisper" for what purpose?
just try to find the root of this issue of Audio output with no content.. if there is a conflict causing this issue..
....
BTW in my process while installing F5-TTS, i faced an issue with Pytorch ,(i couldn't launch the F5-TTS interface) and i solved it with this video fix: https://www.youtube.com/watch?v=ca34C8ZUI0A
just try to find the root of this issue of Audio output with no content.. if there is a conflict causing this issue..
yes, it may be the cause. try using a separate env
# Create a python 3.10 conda env (you could also use virtualenv)
conda create -n f5-tts python=3.10
conda activate f5-tts
@hongyu2024
change to
audio, sr = torchaudio.load(ref_audio, backend="soundfile")
check if works
Thank... modified L324. Change venv to conda. Still won't work. I'll try a different os...
@hongyu2024 so you have already tried dtype=torch.float32
?
@hongyu2024 so you have already tried
dtype=torch.float32
?
Thank you so much! this is effective, the sound is output(cli & gradio)
@hongyu2024 so you have already tried
dtype=torch.float32
?
I tried to change it but I encounter a Error :( torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU
You haven't provide the GPU info in that issue, if you are using a GPU with relatively limited memory, provide ref_text rather than using ASR model to transcribe.
GPU info
I have ( NVIDIA GetForce GTX 1650 ) - 8 Gbps
Thank you a lot :) it works and generate an audio but it show me some errors as I mentioned below in command prompt, is anything else I need to change.
Error Message:::
Starting app... Running on local URL: http://127.0.0.1:7860
To create a public link, set share=True
in launch()
.
gen_text 0 Welcome to your home
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\maena\AppData\Local\Temp\jieba.cache
Loading model cost 0.682 seconds.
Prefix dict has been built successfully.
C:\Users\maena\Desktop\F5-TTS\src\f5_tts\model\modules.py:436: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.)
x = F.scaled_dot_product_attention(query, key, value, attn_mask=attn_mask, dropout_p=0.0, is_causal=False)
C:\ProgramData\miniconda3\envs\f5\lib\site-packages\gradio\processing_utils.py:574: UserWarning: Trying to convert audio automatically from float32 to 16-bit int format.
warnings.warn(warning.format(data.dtype))
Thank you a lot :) it works and generate an audio but it show me some errors as I mentioned below in command prompt, is anything else I need to change.
if working is fine, just ignore the warning
Good news :) it finally worked for me ...
@hongyu2024 so you have already tried
dtype=torch.float32
?Thank you so much! this is effective, the sound is output(cli & gradio)
i used this ☝..and other edit suggestions mentioned here i will provide screenshot to all what i did on _utilsinfer.py file:
really not sure which one exactly that made it works , but it worth knowing that
@hongyu2024 so you have already tried
dtype=torch.float32
?Thank you so much! this is effective, the sound is output(cli & gradio)
Thanks! It works for me too, though I have no idea why.
@medoderi
gpustat
(need pip install gpustat) ornvidia-smi
see the rank thenCUDA_VISIBLE_DEVICES=0 f5-tts_infer-gradio
if rank0
This is specific to Nvidia; however, I have an Intel Xe GPU and an Intel Arc A350M. Is there a way to switch interface usage from CPU to GPU?
@medoderi not sure about intel GPU, maybe try torch 2.4 and .to(device) with .to("xpu")
Good news :) it finally worked for me ...
@hongyu2024 so you have already tried
dtype=torch.float32
?Thank you so much! this is effective, the sound is output(cli & gradio)
i used this ☝..and other edit suggestions mentioned here i will provide screenshot to all what i did on _utilsinfer.py file:
really not sure which one exactly that made it works , but it worth knowing that
Thanks! It works for me too,need to use three:
modify F5-TTS/src/f5_tts/infer/utils_infer.py
1、
asr_pipe = pipeline(
"automatic-speech-recognition",
model="openai/whisper-large-v3-turbo",
# torch_dtype=dtype,
device=device,
)
2、
dtype = torch.float32 # if mel_spec_type == "bigvgan" else None
3、
audio, sr = torchaudio.load(ref_audio, backend="soundfile")
这个不需要该代码只需要修改cuda版本即可解决此问题
这个不需要该代码只需要修改cuda版本即可解决此问题
How to modify the CUDA version?
这个不需要该代码只需要修改cuda版本即可解决此问题
No it doesn't matter
warnings.warn( You have passed task=transcribe, but also have set
forced_decoder_ids
to [[1, None], [2, 50360]] which creates a conflict.forced_decoder_ids
will be ignored in favor of