Hk-Gosuto / ChatGPT-Next-Web-LangChain

一键拥有你自己的 ChatGPT 网页服务。 One-Click to deploy your own ChatGPT web UI.(基于 langchain 实现的插件版本 Plugin version implemented based on langchain)
https://n3xt.chat
MIT License
1.15k stars 386 forks source link

[Feature] 语音输入和输出支持 #208

Closed zpng closed 7 months ago

zpng commented 7 months ago

为了提高交流效率,我们设立了官方 QQ 群和 QQ 频道,如果你在使用或者搭建过程中遇到了任何问题,请先第一时间加群或者频道咨询解决,除非是可以稳定复现的 Bug 或者较为有创意的功能建议,否则请不要随意往 Issue 区发送低质无意义帖子。

点击加入官方群聊

你想要什么功能或者有什么建议? 支持语音转文字和文字转语音的功能。

有没有可以参考的同类竞品? 类似下面这个人的项目里的功能:https://github.com/vual/ChatGPT-Next-Web-Pro image

Issues-translate-bot commented 7 months ago

Bot detected the issue body's language is not English, translate it automatically.


Title: [Feature]

In order to improve communication efficiency, we have set up an official QQ group and QQ channel. If you encounter any problems during use or construction, please join the group or channel for consultation as soon as possible, unless it is a bug that can be stably reproduced or More creative feature suggestions, otherwise please do not send low-quality and meaningless posts to the Issue area.

Click to join the official group chat

What features do you want or have any suggestions? Supports speech-to-text and text-to-speech functions.

Are there any similar competing products that we can refer to? Similar to the function in the following person's project: https://github.com/vual/ChatGPT-Next-Web-Pro image

Hk-Gosuto commented 7 months ago

先支持了基于 OpenAI 的 TTS 功能,语音输入后面空了再加。 image image

ofllm commented 7 months ago

点击语音报错:json f "stack": "Error: Failed to execut e 'decodeAudioData' on 'BaseAudioCo ntext': Unable to decode audio data" 请问除了页面设置之外还需要配置其他地方么?

Issues-translate-bot commented 7 months ago

Bot detected the issue body's language is not English, translate it automatically.


Click the voice error report: json f "stack": "Error: Failed to execute e 'decodeAudioData' on 'BaseAudioCo ntext': Unable to decode audio data" In addition to page settings, do I need to configure other places?

Hk-Gosuto commented 7 months ago

点击语音报错:json f "stack": "Error: Failed to execut e 'decodeAudioData' on 'BaseAudioCo ntext': Unable to decode audio data" 请问除了页面设置之外还需要配置其他地方么?

确定一下是否可以使用 openai 的 tts 模型

Issues-translate-bot commented 7 months ago

Bot detected the issue body's language is not English, translate it automatically.


Click the voice error: json f "stack": "Error: Failed to execut e 'decodeAudioData' on 'BaseAudioContext': Unable to decode audio data" Do you need to configure other places besides page settings?

Determine whether you can use openai’s tts model

ofllm commented 7 months ago

点击语音报错:json f "stack": "Error: Failed to execut e 'decodeAudioData' on 'BaseAudioCo ntext': Unable to decode audio data" 请问除了页面设置之外还需要配置其他地方么?

确定一下是否可以使用 openai 的 tts 模型

手动调用api发现无法正常使用tts模型,感谢大佬回复,谢谢。

Issues-translate-bot commented 7 months ago

Bot detected the issue body's language is not English, translate it automatically.


Click the voice error: json f "stack": "Error: Failed to execut e 'decodeAudioData' on 'BaseAudioContext': Unable to decode audio data" Do you need to configure other places besides page settings?

Determine whether you can use openai’s tts model

I manually called the API and found that the tts model could not be used normally. Thank you for your reply. Thank you.

zpng commented 7 months ago

@Hk-Gosuto 大佬,语音输入的最终的效果演示图是什么样的?需要key支持什么模型?readme上写的需要https访问是指网址需要https域名吗,这个的原因是?

Issues-translate-bot commented 7 months ago

Bot detected the issue body's language is not English, translate it automatically.


@Hk-Gosuto Sir, what is the final effect of voice input? What model does the key need to support? The readme that requires https access means that the website requires an https domain name. What is the reason for this?

Hk-Gosuto commented 7 months ago

设置里开启后,发送按钮会变成语音输入,点击后开始说话,说完再点停止就行。 具体技术使用的是 SpeechRecognition API 不需要设置 key,关于浏览器兼容性可以参考:https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition SpeechRecognition API 在大多数浏览器中要求使用HTTPS才能正常工作。

image image

zpng commented 7 months ago

设置里开启后,发送按钮会变成语音输入,点击后开始说话,说完再点停止就行。 具体技术使用的是 SpeechRecognition API 不需要设置 key,关于浏览器兼容性可以参考:https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition SpeechRecognition API 在大多数浏览器中要求使用HTTPS才能正常工作。

image image

这个为啥不是通过调用openai的api实现的?

Issues-translate-bot commented 7 months ago

Bot detected the issue body's language is not English, translate it automatically.


After turning it on in the settings, the send button will turn into voice input. Click to start speaking, and then click to stop after speaking. The specific technology uses the SpeechRecognition API. There is no need to set a key. For browser compatibility, please refer to: https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition. The SpeechRecognition API is required in most browsers. Use HTTPS to work properly.

![image](https://private-user-images.githubusercontent.com/14031260/312761075-107011a9-93b3-4822-9926-d076d63e0306.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..oWzcJ RaoZbbgZqSx-Rg1wnBXHo_K_9u1Ly3iW_SAmAQ) ![image]( https://private-user-images.githubusercontent.com/14031260/312761327-388453b5-5c4e-4be4-8bf8-b8d01559e996.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..JJUwMh7CGoj5ZOD WND1A2cNPs_KR4yVQ6Ya8l4AWOvE)

Why is this not achieved by calling OpenAI’s API?

Hk-Gosuto commented 7 months ago

这个不收费,识别效果也挺好的,为啥要用 wishper?

Issues-translate-bot commented 7 months ago

Bot detected the issue body's language is not English, translate it automatically.


There is no charge for this, and the recognition effect is pretty good. Why use wishper?

zpng commented 7 months ago

噢噢好的

Issues-translate-bot commented 7 months ago

Bot detected the issue body's language is not English, translate it automatically.


Oh ok

Hk-Gosuto commented 7 months ago

可以多试一些场景,如果复杂场景效果不好的话,后面会考虑增加 wishper 适配。

Issues-translate-bot commented 7 months ago

Bot detected the issue body's language is not English, translate it automatically.


You can try more scenes. If the effect of complex scenes is not good, we will consider adding wishper adaptation later.

jcr8745dqy100 commented 3 months ago

当使用openai tts时,每一次让它说,都会重新申请一次tts请求,能不能第一次就把语音下载到本地,过后重新听就不浪费请求了

Issues-translate-bot commented 3 months ago

Bot detected the issue body's language is not English, translate it automatically.


When using openai tts, every time it is asked to speak, it will re-apply for a tts request. Can the voice be downloaded to the local for the first time, so that it can be listened to again later without wasting the request?

Hk-Gosuto commented 3 months ago

当使用openai tts时,每一次让它说,都会重新申请一次tts请求,能不能第一次就把语音下载到本地,过后重新听就不浪费请求了

我看看能不能把音频丢 indexedDB 里,可以先切换到 edge tts 那个不产生费用。

Issues-translate-bot commented 3 months ago

Bot detected the issue body's language is not English, translate it automatically.


When using openai tts, every time it is asked to speak, it will re-apply for a tts request. Can the voice be downloaded to the local for the first time, so that it can be listened to again later without wasting the request?

I'll see if I can throw the audio into indexedDB. I can switch to edge tts first which doesn't incur any charges.