SWivid / F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
https://arxiv.org/abs/2410.06885
MIT License
126 stars 11 forks source link

windows运行test_infer_single卡了好久不动弹 #1

Closed yuyun2000 closed 1 day ago

yuyun2000 commented 1 day ago

显存8G,log信息如下:

Traceback (most recent call last):
  File "D:\github\F5-TTS-main\test_infer_single.py", line 142, in <module>
    generated, trajectory = model.sample(
  File "D:\ProgramData\anaconda3\envs\funasr\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "D:\github\F5-TTS-main\model\cfm.py", line 187, in sample
    trajectory = odeint(fn, y0, t, **self.odeint_kwargs)
  File "D:\ProgramData\anaconda3\envs\funasr\lib\site-packages\torchdiffeq\_impl\odeint.py", line 79, in odeint
    solution = solver.integrate(t)
  File "D:\ProgramData\anaconda3\envs\funasr\lib\site-packages\torchdiffeq\_impl\solvers.py", line 114, in integrate
    dy, f0 = self._step_func(self.func, t0, dt, t1, y0)
  File "D:\ProgramData\anaconda3\envs\funasr\lib\site-packages\torchdiffeq\_impl\fixed_grid.py", line 10, in _step_func
    f0 = func(t0, y0, perturb=Perturb.NEXT if self.perturb else Perturb.NONE)
  File "D:\ProgramData\anaconda3\envs\funasr\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\ProgramData\anaconda3\envs\funasr\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\ProgramData\anaconda3\envs\funasr\lib\site-packages\torchdiffeq\_impl\misc.py", line 197, in forward
    return self.base_func(t, y)
  File "D:\github\F5-TTS-main\model\cfm.py", line 158, in fn
    pred = self.transformer(x = x, cond = step_cond, text = text, time = t, mask = mask, drop_audio_cond = False, drop_text = False)
  File "D:\ProgramData\anaconda3\envs\funasr\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\ProgramData\anaconda3\envs\funasr\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\github\F5-TTS-main\model\backbones\dit.py", line 150, in forward
    x = block(x, t, mask = mask, rope = rope)
  File "D:\ProgramData\anaconda3\envs\funasr\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\ProgramData\anaconda3\envs\funasr\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\github\F5-TTS-main\model\modules.py", line 479, in forward
    attn_output = self.attn(x=norm, mask=mask, rope=rope)
  File "D:\ProgramData\anaconda3\envs\funasr\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\ProgramData\anaconda3\envs\funasr\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\github\F5-TTS-main\model\modules.py", line 306, in forward
    return self.processor(self, x, mask = mask, rope = rope)
  File "D:\github\F5-TTS-main\model\modules.py", line 327, in __call__
    key = attn.to_k(x)
  File "D:\ProgramData\anaconda3\envs\funasr\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\ProgramData\anaconda3\envs\funasr\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\ProgramData\anaconda3\envs\funasr\lib\site-packages\torch\nn\modules\linear.py", line 116, in forward
    return F.linear(input, self.weight, self.bias)
KeyboardInterrupt

为什么会在这里卡住,是因为显存不够?但是我看显存占用也没增加,也没爆oom的错误..

yuyun2000 commented 1 day ago

我去 我pc的torch咋变成cpu版本了...

yuyun2000 commented 1 day ago

运行完了,占用显存7.5G,英语示例完美,但是中文声音速度非常奇怪,很慢,感谢作者开源这么优秀的作品

yuyun2000 commented 1 day ago

不是语速慢...是停顿太久了

SWivid commented 1 day ago

运行完了,占用显存7.5G,英语示例完美,但是中文声音速度非常奇怪,很慢,感谢作者开源这么优秀的作品

能用就好哈哈,可以调整下fix_duration(是包含prompt和要生成的总时长),如果设置None就是按照字符个数线性估算的 过几天会把paper挂出来,应该包含大部分的细节

yuyun2000 commented 1 day ago

太强了太强了~