FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
https://funaudiollm.github.io/
Apache License 2.0
6.43k stars 694 forks source link

webui是否支持多路并发?开两个页面同时流式合成报错 #658

Open yangpeng-space opened 4 days ago

yangpeng-space commented 4 days ago

Traceback (most recent call last): File "/home/dl/micromamba/envs/CosyVoice/lib/python3.8/site-packages/gradio/queueing.py", line 560, in process_events response = await route_utils.call_process_api( File "/home/dl/micromamba/envs/CosyVoice/lib/python3.8/site-packages/gradio/route_utils.py", line 276, in call_process_api output = await app.get_blocks().process_api( File "/home/dl/micromamba/envs/CosyVoice/lib/python3.8/site-packages/gradio/blocks.py", line 1945, in process_api result = await self.call_function( File "/home/dl/micromamba/envs/CosyVoice/lib/python3.8/site-packages/gradio/blocks.py", line 1525, in call_function prediction = await utils.async_iteration(iterator) File "/home/dl/micromamba/envs/CosyVoice/lib/python3.8/site-packages/gradio/utils.py", line 655, in async_iteration return await iterator.anext() File "/home/dl/micromamba/envs/CosyVoice/lib/python3.8/site-packages/gradio/utils.py", line 648, in anext return await anyio.to_thread.run_sync( File "/home/dl/micromamba/envs/CosyVoice/lib/python3.8/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "/home/dl/micromamba/envs/CosyVoice/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 2364, in run_sync_in_worker_thread return await future File "/home/dl/micromamba/envs/CosyVoice/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 864, in run result = context.run(func, *args) File "/home/dl/micromamba/envs/CosyVoice/lib/python3.8/site-packages/gradio/utils.py", line 631, in run_sync_iterator_async return next(iterator) File "/home/dl/micromamba/envs/CosyVoice/lib/python3.8/site-packages/gradio/utils.py", line 814, in gen_wrapper response = next(iterator) File "webui.py", line 179, in generate_audio for i in cosyvoice.inference_sft(tts_text, sft_dropdown, stream=stream, speed=speed,new_dropdown=new_dropdown): File "/home/dl/data/FunAudioLLM/CosyVoice/cosyvoice/cli/cosyvoice.py", line 132, in inference_sft for model_output in self.model.tts(model_input, stream=stream, speed=speed): File "/home/dl/data/FunAudioLLM/CosyVoice/cosyvoice/cli/model.py", line 172, in tts this_tts_speech = self.token2wav(token=this_tts_speech_token, File "/home/dl/data/FunAudioLLM/CosyVoice/cosyvoice/cli/model.py", line 112, in token2wav tts_mel, flow_cache = self.flow.inference(token=token.to(self.device), File "/home/dl/micromamba/envs/CosyVoice/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "/home/dl/data/FunAudioLLM/CosyVoice/cosyvoice/flow/flow.py", line 137, in inference feat, flow_cache = self.decoder( File "/home/dl/micromamba/envs/CosyVoice/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/dl/micromamba/envs/CosyVoice/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "/home/dl/data/FunAudioLLM/CosyVoice/cosyvoice/flow/flow_matching.py", line 67, in forward return self.solve_euler(z, t_span=t_span, mu=mu, mask=mask, spks=spks, cond=cond), flow_cache File "/home/dl/data/FunAudioLLM/CosyVoice/cosyvoice/flow/flow_matching.py", line 95, in solve_euler cfg_dphi_dt = self.forward_estimator( File "/home/dl/data/FunAudioLLM/CosyVoice/cosyvoice/flow/flow_matching.py", line 113, in forward_estimator return self.estimator.forward(x, mask, mu, t, spks, cond) File "/home/dl/data/FunAudioLLM/CosyVoice/cosyvoice/flow/decoder.py", line 180, in forward x = transformer_block( File "/home/dl/micromamba/envs/CosyVoice/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/dl/data/FunAudioLLM/CosyVoice/third_party/Matcha-TTS/matcha/models/components/transformer.py", line 266, in forward attn_output = self.attn1( File "/home/dl/micromamba/envs/CosyVoice/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/dl/micromamba/envs/CosyVoice/lib/python3.8/site-packages/diffusers/models/attention_processor.py", line 522, in forward return self.processor( File "/home/dl/micromamba/envs/CosyVoice/lib/python3.8/site-packages/diffusers/models/attention_processor.py", line 1251, in call attention_mask = attn.prepare_attention_mask(attention_mask, sequence_length, batch_size) File "/home/dl/micromamba/envs/CosyVoice/lib/python3.8/site-packages/diffusers/models/attention_processor.py", line 661, in prepare_attention_mask attention_mask = attention_mask.repeat_interleave(head_size, dim=0) RuntimeError: CUDA error: misaligned address CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Exception in thread Thread-15: Traceback (most recent call last): File "/home/dl/micromamba/envs/CosyVoice/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/home/dl/micromamba/envs/CosyVoice/lib/python3.8/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/home/dl/data/FunAudioLLM/CosyVoice/cosyvoice/cli/model.py", line 101, in llm_job for i in self.llm.inference(text=text.to(self.device), File "/home/dl/micromamba/envs/CosyVoice/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 56, in generator_context response = gen.send(request) File "/home/dl/data/FunAudioLLM/CosyVoice/cosyvoice/llm/llm.py", line 200, in inference y_pred, att_cache, cnn_cache = self.llm.forward_chunk(lm_input, offset=offset, required_cache_size=-1, RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript, serialized code (most recent call last): File "code/torch/cosyvoice/transformer/encoder/___torch_mangle_25.py", line 1468, in forward_chunk _564 = torch.contiguous(torch.transpose(x111, 1, 2)) x112 = torch.view(_564, [n_batch35, -1, 1024]) _565 = torch.linear(x112, CONSTANTS.c242, CONSTANTS.c73)


    x113 = torch.add(x109, _565)
    x114 = torch.layer_norm(x113, [1024], CONSTANTS.c74, CONSTANTS.c75)

Traceback of TorchScript, original code (most recent call last):
  File "/mnt/lyuxiang.lx/anaconda3/envs/cosyvoice_refactor/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 116, in forward_chunk
    def forward(self, input: Tensor) -> Tensor:
        return F.linear(input, self.weight, self.bias)
               ~~~~~~~~ <--- HERE
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasLtMatmul with transpose_mat1 1 transpose_mat2 0 m 1024 n 1 k 1024 mat1_ld 1024 mat2_ld 1024 result_ld 1024 abcType 2 computeType 68 scaleType 0

 71%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                                                      | 5/7 [00:57<00:23, 11.58s/it]
Traceback (most recent call last):
  File "/home/dl/micromamba/envs/CosyVoice/lib/python3.8/site-packages/gradio/queueing.py", line 560, in process_events
    response = await route_utils.call_process_api(
  File "/home/dl/micromamba/envs/CosyVoice/lib/python3.8/site-packages/gradio/route_utils.py", line 276, in call_process_api
    output = await app.get_blocks().process_api(
  File "/home/dl/micromamba/envs/CosyVoice/lib/python3.8/site-packages/gradio/blocks.py", line 1945, in process_api
    result = await self.call_function(
  File "/home/dl/micromamba/envs/CosyVoice/lib/python3.8/site-packages/gradio/blocks.py", line 1525, in call_function
    prediction = await utils.async_iteration(iterator)
  File "/home/dl/micromamba/envs/CosyVoice/lib/python3.8/site-packages/gradio/utils.py", line 655, in async_iteration
    return await iterator.__anext__()
  File "/home/dl/micromamba/envs/CosyVoice/lib/python3.8/site-packages/gradio/utils.py", line 648, in __anext__
    return await anyio.to_thread.run_sync(
  File "/home/dl/micromamba/envs/CosyVoice/lib/python3.8/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/home/dl/micromamba/envs/CosyVoice/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 2364, in run_sync_in_worker_thread
    return await future
  File "/home/dl/micromamba/envs/CosyVoice/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 864, in run
    result = context.run(func, *args)
  File "/home/dl/micromamba/envs/CosyVoice/lib/python3.8/site-packages/gradio/utils.py", line 631, in run_sync_iterator_async
    return next(iterator)
  File "/home/dl/micromamba/envs/CosyVoice/lib/python3.8/site-packages/gradio/utils.py", line 814, in gen_wrapper
    response = next(iterator)
  File "webui.py", line 179, in generate_audio
    for i in cosyvoice.inference_sft(tts_text, sft_dropdown, stream=stream, speed=speed,new_dropdown=new_dropdown):
  File "/home/dl/data/FunAudioLLM/CosyVoice/cosyvoice/cli/cosyvoice.py", line 132, in inference_sft
    for model_output in self.model.tts(**model_input, stream=stream, speed=speed):
  File "/home/dl/data/FunAudioLLM/CosyVoice/cosyvoice/cli/model.py", line 172, in tts
    this_tts_speech = self.token2wav(token=this_tts_speech_token,
  File "/home/dl/data/FunAudioLLM/CosyVoice/cosyvoice/cli/model.py", line 112, in token2wav
    tts_mel, flow_cache = self.flow.inference(token=token.to(self.device),
  File "/home/dl/micromamba/envs/CosyVoice/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/dl/data/FunAudioLLM/CosyVoice/cosyvoice/flow/flow.py", line 137, in inference
    feat, flow_cache = self.decoder(
  File "/home/dl/micromamba/envs/CosyVoice/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/dl/micromamba/envs/CosyVoice/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/dl/data/FunAudioLLM/CosyVoice/cosyvoice/flow/flow_matching.py", line 67, in forward
    return self.solve_euler(z, t_span=t_span, mu=mu, mask=mask, spks=spks, cond=cond), flow_cache
  File "/home/dl/data/FunAudioLLM/CosyVoice/cosyvoice/flow/flow_matching.py", line 95, in solve_euler
    cfg_dphi_dt = self.forward_estimator(
  File "/home/dl/data/FunAudioLLM/CosyVoice/cosyvoice/flow/flow_matching.py", line 113, in forward_estimator
    return self.estimator.forward(x, mask, mu, t, spks, cond)
  File "/home/dl/data/FunAudioLLM/CosyVoice/cosyvoice/flow/decoder.py", line 161, in forward
    t = self.time_embeddings(t).to(t.dtype)
  File "/home/dl/micromamba/envs/CosyVoice/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/dl/data/FunAudioLLM/CosyVoice/third_party/Matcha-TTS/matcha/models/components/decoder.py", line 26, in forward
    emb = torch.exp(torch.arange(half_dim, device=device).float() * -emb)
RuntimeError: CUDA error: misaligned address
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
aluminumbox commented 3 days ago

试试load_jit=False然后并发推理