YYuX-1145 / Srt-AI-Voice-Assistant

https://www.bilibili.com/video/BV1PVeceXEmN
GNU Affero General Public License v3.0
58 stars 3 forks source link

支持CosyVoice TTS #17

Closed DDXDB closed 1 month ago

DDXDB commented 1 month ago

官方仓库:https://github.com/FunAudioLLM/CosyVoice

基于py和Gradio的实现

API documentation
http://localhost:50000/
3 API endpoints

🪄 API Recorder

Use the [Python library](https://www.gradio.app/guides/getting-started-with-the-python-client) or the [Javascript package](https://www.gradio.app/guides/getting-started-with-the-js-client) to query the app via API.gradio_client@gradio/client

1. Install the client if you don't already have it installed.

copy
$ pip install gradio_client
2. Find the API endpoint below corresponding to your desired function in the app. Copy the code snippet, replacing the placeholder values with your own input data. Or 
🪄 Use the API Recorder
 to automatically generate your API requests.

api_name: /generate_seed

copy
from gradio_client import Client

client = Client("http://localhost:50000/")
result = client.predict(
        api_name="/generate_seed"
)
print(result)
Accepts 0 parameters:
Returns 1 element
float

The output value that appears in the "随机推理种子" Number component.

api_name: /generate_audio

copy
from gradio_client import Client, file

client = Client("http://localhost:50000/")
result = client.predict(
        tts_text="我是通义实验室语音团队全新推出的生成式语音大模型,提供舒适自然的语音合成能力。",
        mode_checkbox_group="预训练音色",
        sft_dropdown="中文女",
        prompt_text="",
        prompt_wav_upload=file('https://github.com/gradio-app/gradio/raw/main/test/test_files/audio_sample.wav'),
        prompt_wav_record=file('https://github.com/gradio-app/gradio/raw/main/test/test_files/audio_sample.wav'),
        instruct_text="",
        seed=0,
        speed_factor=1,
        api_name="/generate_audio"
)
print(result)
Accepts 9 parameters:
tts_text str Default: "我是通义实验室语音团队全新推出的生成式语音大模型,提供舒适自然的语音合成能力。"

The input value that is provided in the "输入合成文本" Textbox component.

mode_checkbox_group Literal['预训练音色', '3s极速复刻', '跨语种复刻', '自然语言控制'] Default: "预训练音色"

The input value that is provided in the "选择推理模式" Radio component.

sft_dropdown Literal['中文女', '中文男', '日语男', '粤语女', '英文女', '英文男', '韩语女'] Default: "中文女"

The input value that is provided in the "选择预训练音色" Dropdown component.

prompt_text str Default: ""

The input value that is provided in the "输入prompt文本" Textbox component.

prompt_wav_upload filepath Required

The input value that is provided in the "选择prompt音频文件,注意采样率不低于16khz" Audio component.

prompt_wav_record filepath Required

The input value that is provided in the "录制prompt音频文件" Audio component.

instruct_text str Default: ""

The input value that is provided in the "输入instruct文本" Textbox component.

seed float Default: 0

The input value that is provided in the "随机推理种子" Number component.

speed_factor float Default: 1

The input value that is provided in the "语速调节" Slider component.

Returns 1 element
filepath

The output value that appears in the "合成音频" Audio component.

api_name: /change_instruction

copy
from gradio_client import Client

client = Client("http://localhost:50000/")
result = client.predict(
        mode_checkbox_group="预训练音色",
        api_name="/change_instruction"
)
print(result)
Accepts 1 parameter:
mode_checkbox_group Literal['预训练音色', '3s极速复刻', '跨语种复刻', '自然语言控制'] Default: "预训练音色"

The input value that is provided in the "选择推理模式" Radio component.

Returns 1 element
str

The output value that appears in the "操作步骤" Textbox component.
YYuX-1145 commented 1 month ago

暂时没有计划支持流行度相当较低的项目。 如果你着急,可以尝试将下列代码替换到gsv_api函数中

from gradio_client import Client, file

client = Client("http://localhost:50000/")
result = client.predict(
        tts_text="我是通义实验室语音团队全新推出的生成式语音大模型,提供舒适自然的语音合成能力。",
        mode_checkbox_group="预训练音色",
        sft_dropdown="中文女",
        prompt_text="",
        prompt_wav_upload=file('https://github.com/gradio-app/gradio/raw/main/test/test_files/audio_sample.wav'),
        prompt_wav_record=file('https://github.com/gradio-app/gradio/raw/main/test/test_files/audio_sample.wav'),
        instruct_text="",
        seed=0,
        speed_factor=1,
        api_name="/generate_audio"
)
with open(result,'rb') as file:
        data=file.read()
return data

由于我没有实际使用过此项目,不保证根据gradio帮助推测出的代码一次性跑通,可以求助GPT

YYuX-1145 commented 1 month ago

感谢反馈,现在你可以使用自定义API功能了

DDXDB commented 1 month ago

自己写了CosyVoice_API.py

def custom_api(text):#return: audio content
    from gradio_client import Client, handle_file
    client = Client("http://localhost:50000/")
    result = client.predict(
        tts_text=text,
        mode_checkbox_group="", #'预训练音色', '3s极速复刻', '跨语种复刻', '自然语言控制'
        sft_dropdown="中文女",
        prompt_text="",
        prompt_wav_upload=handle_file(''),
        prompt_wav_record=handle_file(''),
        instruct_text="",
        seed=0,
        speed_factor=1,
        api_name="/generate_audio"
    )
    with open(result[1],'rb') as file:
        data=file.read()
    return data

在requirements.txt默认的gradio==3.50.2下运行会提示 cannot import name 'handle_file' from 'gradio_client' 去除指定==3.50.2版本重装虚拟环境后可运行

并且可以与CosyVoice后端通讯并且正确生成临时音频文件,但之后会报错:

Active code page: 65001

F:\Srt-AI-Voice-Assistant>call venv\\Scripts\\activate.bat
[INFO][2024-09-22_20:52:56]:load_cfg: 当前没有自定义设置
F:\Srt-AI-Voice-Assistant\venv\lib\site-packages\gradio\helpers.py:1276: UserWarning: 无法下载微软TTS说话人列表。报错内 容: please fill in your key to get MSTTS speaker list.
  warnings.warn(message)
[ERROR][2024-09-22_20:52:56]:getms_speakers: 无法下载微软TTS说话人列表。报错内容: please fill in your key to get MSTTS speaker list.
Running on local URL:  http://127.0.0.1:7860
[INFO][2024-09-22_20:52:56]:_send_single_request: HTTP Request: GET http://127.0.0.1:7860/startup-events "HTTP/1.1 200 OK"
[INFO][2024-09-22_20:52:56]:_send_single_request: HTTP Request: HEAD http://127.0.0.1:7860/ "HTTP/1.1 200 OK"

To create a public link, set `share=True` in `launch()`.
[INFO][2024-09-22_20:52:56]:_send_single_request: HTTP Request: GET https://checkip.amazonaws.com/ "HTTP/1.1 200 "
[INFO][2024-09-22_20:52:57]:_send_single_request: HTTP Request: GET https://api.gradio.app/pkg-version "HTTP/1.1 200 OK"
[INFO][2024-09-22_20:53:01]:generate: Exec: custom_api_path
Loaded as API: http://localhost:50000/ ✔
Loaded as API: http://localhost:50000/ ✔
[INFO][2024-09-22_20:53:03]:_send_single_request: HTTP Request: GET http://localhost:50000/config "HTTP/1.1 200 OK"
[INFO][2024-09-22_20:53:03]:_send_single_request: HTTP Request: GET http://localhost:50000/config "HTTP/1.1 200 OK"
[INFO][2024-09-22_20:53:05]:_send_single_request: HTTP Request: GET http://localhost:50000/info?serialize=False "HTTP/1.1 200 OK"
[INFO][2024-09-22_20:53:05]:_send_single_request: HTTP Request: GET http://localhost:50000/info?serialize=False "HTTP/1.1 200 OK"
[INFO][2024-09-22_20:53:07]:_send_single_request: HTTP Request: GET http://localhost:50000/heartbeat/2975476d-af0b-4f22-9d40-24c31c9dbfb3 "HTTP/1.1 200 OK"
[INFO][2024-09-22_20:53:07]:_send_single_request: HTTP Request: POST http://localhost:50000/upload "HTTP/1.1 200 OK"
[INFO][2024-09-22_20:53:07]:_send_single_request: HTTP Request: GET http://localhost:50000/heartbeat/bfaf6f2c-297a-4410-98f1-d8fbf45e1690 "HTTP/1.1 200 OK"
[INFO][2024-09-22_20:53:08]:_send_single_request: HTTP Request: POST http://localhost:50000/upload "HTTP/1.1 200 OK"
[INFO][2024-09-22_20:53:10]:_send_single_request: HTTP Request: POST http://localhost:50000/upload "HTTP/1.1 200 OK"
[INFO][2024-09-22_20:53:10]:_send_single_request: HTTP Request: POST http://localhost:50000/upload "HTTP/1.1 200 OK"
[INFO][2024-09-22_20:53:12]:_send_single_request: HTTP Request: POST http://localhost:50000/queue/join "HTTP/1.1 200 OK"
[INFO][2024-09-22_20:53:12]:_send_single_request: HTTP Request: POST http://localhost:50000/queue/join "HTTP/1.1 200 OK"
[INFO][2024-09-22_20:53:14]:_send_single_request: HTTP Request: GET http://localhost:50000/queue/data?session_hash=2975476d-af0b-4f22-9d40-24c31c9dbfb3 "HTTP/1.1 200 OK"
[INFO][2024-09-22_20:53:14]:_send_single_request: HTTP Request: GET http://localhost:50000/queue/data?session_hash=bfaf6f2c-297a-4410-98f1-d8fbf45e1690 "HTTP/1.1 200 OK"
[INFO][2024-09-22_20:53:21]:_send_single_request: HTTP Request: GET http://localhost:50000/file=C:\Users\98440\AppData\Local\Temp\gradio\8396594d8be9edac19d02df12ae11f19bc9729bd\audio.wav "HTTP/1.1 200 OK"
Loaded as API: http://localhost:50000/ ✔
[INFO][2024-09-22_20:53:22]:_send_single_request: HTTP Request: GET http://localhost:50000/file=C:\Users\98440\AppData\Local\Temp\gradio\16e2a6ac3ee91c8be5afedc8baa4102754acf6eb\audio.wav "HTTP/1.1 200 OK"
[INFO][2024-09-22_20:53:24]:_send_single_request: HTTP Request: GET http://localhost:50000/config "HTTP/1.1 200 OK"
[INFO][2024-09-22_20:53:26]:_send_single_request: HTTP Request: GET http://localhost:50000/info?serialize=False "HTTP/1.1 200 OK"
[INFO][2024-09-22_20:53:28]:_send_single_request: HTTP Request: GET http://localhost:50000/heartbeat/fdd70726-1490-46d3-885b-c60ce79907c9 "HTTP/1.1 200 OK"
[INFO][2024-09-22_20:53:28]:_send_single_request: HTTP Request: POST http://localhost:50000/upload "HTTP/1.1 200 OK"
[INFO][2024-09-22_20:53:30]:_send_single_request: HTTP Request: POST http://localhost:50000/upload "HTTP/1.1 200 OK"
[INFO][2024-09-22_20:53:32]:_send_single_request: HTTP Request: POST http://localhost:50000/queue/join "HTTP/1.1 200 OK"
[INFO][2024-09-22_20:53:34]:_send_single_request: HTTP Request: GET http://localhost:50000/queue/data?session_hash=fdd70726-1490-46d3-885b-c60ce79907c9 "HTTP/1.1 200 OK"
[INFO][2024-09-22_20:53:40]:_send_single_request: HTTP Request: GET http://localhost:50000/file=C:\Users\98440\AppData\Local\Temp\gradio\6639a6904746720a4d348ce7b9a7a3fe7063ac67\audio.wav "HTTP/1.1 200 OK"
Traceback (most recent call last):
  File "F:\Srt-AI-Voice-Assistant\venv\lib\site-packages\gradio\queueing.py", line 536, in process_events
    response = await route_utils.call_process_api(
  File "F:\Srt-AI-Voice-Assistant\venv\lib\site-packages\gradio\route_utils.py", line 322, in call_process_api
    output = await app.get_blocks().process_api(
  File "F:\Srt-AI-Voice-Assistant\venv\lib\site-packages\gradio\blocks.py", line 1935, in process_api
    result = await self.call_function(
  File "F:\Srt-AI-Voice-Assistant\venv\lib\site-packages\gradio\blocks.py", line 1520, in call_function
    prediction = await anyio.to_thread.run_sync(  # type: ignore
  File "F:\Srt-AI-Voice-Assistant\venv\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "F:\Srt-AI-Voice-Assistant\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 2405, in run_sync_in_worker_thread
    return await future
  File "F:\Srt-AI-Voice-Assistant\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 914, in run
    result = context.run(func, *args)
  File "F:\Srt-AI-Voice-Assistant\venv\lib\site-packages\gradio\utils.py", line 826, in wrapper
    response = f(*args, **kwargs)
  File "F:\Srt-AI-Voice-Assistant\Srt-AI-Voice-Assistant.py", line 510, in generate_custom
    return generate((custom_api),proj="custom",in_file=input_file,sr=None,fps=fps,offset=offset,max_workers=workers)
  File "F:\Srt-AI-Voice-Assistant\Srt-AI-Voice-Assistant.py", line 559, in generate
    file_list = list(executor.map(lambda x: save(x[0], **x[1]),[(args, {'proj': proj, 'text': i.text, 'dir': dirname, 'subid': i.index}) for i in subtitle_list]))
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\lib\concurrent\futures\_base.py", line 621, in result_iterator
    yield _result_or_cancel(fs.pop())
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\lib\concurrent\futures\_base.py", line 319, in _result_or_cancel
    return fut.result(timeout)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\lib\concurrent\futures\_base.py", line 458, in result
    return self.__get_result()
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\lib\concurrent\futures\_base.py", line 403, in __get_result
    raise self._exception
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\lib\concurrent\futures\thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "F:\Srt-AI-Voice-Assistant\Srt-AI-Voice-Assistant.py", line 559, in <lambda>
    file_list = list(executor.map(lambda x: save(x[0], **x[1]),[(args, {'proj': proj, 'text': i.text, 'dir': dirname, 'subid': i.index}) for i in subtitle_list]))
  File "F:\Srt-AI-Voice-Assistant\Srt-AI-Voice-Assistant.py", line 679, in save
    audio=custom_api(text)
  File "<string>", line 16, in custom_api
OSError: [Errno 22] Invalid argument: ':'

大佬有没有思绪@YYuX-1145

YYuX-1145 commented 1 month ago
prompt_wav_upload=handle_file('),
prompt_wav_record=handle_file(''),

第一行疑似语法错误,另外一开始你的gradio并没有提示用handle_file函数而是file。我不确定这样改行不行。最后再打印一下result,直接抄result[1]如果返回参数不对会报错

DDXDB commented 1 month ago
prompt_wav_upload=handle_file('),
prompt_wav_record=handle_file(''),

第一行疑似语法错误,另外一开始你的gradio并没有提示用handle_file函数而是file。我不确定这样改行不行。最后再打印一下result,直接抄result[1]如果返回参数不对会报错

这个应该是我上传的时候删多了=-= handle_file和file我试了结果一样,用file的话gradio会提示淘汰,反正自己写的就直接改handle_file了,result[1]是加在最后面么

YYuX-1145 commented 1 month ago

我猜一猜,API的返回值就是一个字符串而不是2个元素,也就是说直接取result就够了不要用索引。出错的原因很有可能是音频路径是‘C:\XXX’通过[1]访问到冒号了,冒号作为文件名是非法字符所以报错了

DDXDB commented 1 month ago

我猜一猜,API的返回值就是一个字符串而不是2个元素,也就是说直接取result就够了不要用索引。出错的原因很有可能是音频路径是‘C:\XXX’通过[1]访问到冒号了,冒号作为文件名是非法字符所以报错了

改成下述后错误依旧

def custom_api(text):#return: audio content
    from gradio_client import Client, handle_file
    client = Client("http://localhost:50000/")
    result = client.predict(
        tts_text=text,
        mode_checkbox_group="", #'预训练音色', '3s极速复刻', '跨语种复刻', '自然语言控制'
        sft_dropdown="中文女",
        prompt_text="",
        prompt_wav_upload=handle_file(''),
        prompt_wav_record=handle_file(''),
        instruct_text="",
        seed=0,
        speed_factor=1,
        api_name="/generate_audio"
    )
    with open(result[1],'rb') as file:
        data=file.read()
    return result[1]

应该是路径的问题了:(

YYuX-1145 commented 1 month ago
    with open(result[1],'rb') as file:
        data=file.read()
    return result[1]

这不是还没改吗?把[1]去掉,返回data

DDXDB commented 1 month ago
    with open(result[1],'rb') as file:
        data=file.read()
    return result[1]

这不是还没改吗?把[1]去掉,返回data

最早是return data

YYuX-1145 commented 1 month ago

应该正确的代码

    with open(result,'rb') as file:
        data=file.read()
    return data
DDXDB commented 1 month ago

应该正确的代码

    with open(result,'rb') as file:
        data=file.read()
    return data

对不起我理解错了:( 确认有效,最终代码

def custom_api(text):#return: audio content
    from gradio_client import Client, handle_file
    client = Client("http://localhost:50000/")
    result = client.predict(
        tts_text=text,
        mode_checkbox_group="", #'预训练音色', '3s极速复刻', '跨语种复刻', '自然语言控制'
        sft_dropdown="中文女",
        prompt_text="",
        prompt_wav_upload=handle_file(''),
        prompt_wav_record=handle_file(''),
        instruct_text="",
        seed=0,
        speed_factor=1,
        api_name="/generate_audio"
    )
    with open(result,'rb') as file:
        data=file.read()
    return data