AndraxDev / speak-gpt

Your personal voice assistant based on OpenAI ChatGPT.
https://play.google.com/store/apps/details?id=org.teslasoft.assistant
Apache License 2.0
281 stars 59 forks source link

bug: whisper only works with the default endpoint selected #109

Closed thiswillbeyourgithub closed 5 months ago

thiswillbeyourgithub commented 5 months ago

Hi,

I never got whisper to work if I'm using another endpoint for the LLM. Maybe its because whisper uses the user selected endpoint? If so is it possible to make whisper always use the Default endpoint? I just get the error "failed to record audio".

The maybe relevant log is this:

type: logcat
osVersion: google/bramble/bramble:14/UP1A.231105.001.B2/2024050300:user/release-keys
packageName: org.teslasoft.assistant:402
buffers: main,system,crash,events,kernel
level: error

--------- beginning of main
05-05 22:30:18.537 16988 16988 E DynamiteModule: Invalid GmsCore APK, remote loading disabled.
05-05 22:30:18.556 16988 17358 E DynamiteModule: Invalid GmsCore APK, remote loading disabled.

Unfortunately I can't use the default android tts because I don't have the google services for privacy reasons.

AndraxDev commented 5 months ago

This issue may not be fixed because default endpoint is not openai and user may not provide their openai key.

AndraxDev commented 5 months ago

If user decided not to use OpenAI requests will also fail so the're no sense to perform any changes.

thiswillbeyourgithub commented 5 months ago

Hi,

I disagree for what I think are authantically good reasons and not whims, but I certainly don't want to be pushy so : I'll hide my arguments below so you can ignore them if you want and am offering a bounty of 10-20 euros for this. I hope that's okay with you.

Feature bounty

my arguments
again I don't want to make you angry I just want speakgpt to make you rich and famous by being even greater than it currently is
and I really appreciate your work! 1. The most used speech to text model is whisper, which is open source. But there are no good API providers for whisper. There are some on [replicate](https://replicate.com/) but they can't be used for that because replicate has a cold boot delay. I think that most people that use whisper without the openai api use local service based on [whispercpp](https://github.com/ggerganov/whisper.cpp), for example via [localai](https://localai.io). 2. In any case, openai is the only provider of LLM that also provides speech to text, none of mistral, anthropic, etc does. Openrouter does not even give access to openai/whisper. So to me when I see that SpeakGPT allows changing the endpoint I assume it means the LLM endpoint, as there are no real endpoint that anyone could use for whisper outside of openai. 3. SpeakGPT prominently shows that it gives access to other endpoints, which is awesome but it's unexpected to me that you lose the Speak part of **Speak**GPT if you want to try another LLM provider. 4. My point about whisper also works for `/imagine` command, I don't think it's intuitive that wanting to try another LLM would stop you from using whisper or dalle, especially given how prominently they are featured in the readme. 5. It makes sense to crash whisper if the user has not provided an openai api key, it doesn't make sense to stop someone who provided an openai api key but decided to try another endpoint from using whisper.
AndraxDev commented 5 months ago

The're some argument that you didn't understand:

Even if Whisper is open-source I will not implement it because nothing of the following solutions will be suitable.

  1. Offline (embed library directly to the app) - can't be use because I don't want to make app too large and Google Play Store does not like such moved (furthermore I will not build separate versions for Github and Play Store)

  2. Online (run model on my servers) - servers cost momey (powerful server likely cosl more than 20€ and). If I will implement Whisper server-site I will have to take some money but as you can see SpeakGPT is not a commercial project, it an API client.

So the solution will be limited to the following changes:

  1. DALL-e and Whisper will work if OpenAI endpoint is set
AndraxDev commented 5 months ago

Now you can enjoy this functionality. To provide seamless experience you will be prompted to add API endpoint without breaking your workflow. If you click "Yes" an API endpoint dialog will be opened. Once you add correct API endpoint you command will continue to execute (ex. dalle image will be generated or text is pronounced)

AndraxDev commented 5 months ago

I'm still waiting your reward :)) You can decide how much but if you write something then I will excpect it :)

thiswillbeyourgithub commented 5 months ago

Oh I'm sorry I didn't understand if you agreed or not, it's your home here so I could understand that you refrain from "workifying" your project and didn't want to insist. Thank you for clarifying, I'll do it tomorow! Thanks a lot for the new features!

(also maybe not worth openning a new issue for that but: the new animation that makes the chats slowly appear when you scroll, it's beautiful but on my devices it's suspiciously "slow" so my chats appear slower than the chatlist scrolls, making the list appear empty for a fraction of a second when scrolling, which looks a bit glitchy! (do tell me if it's worth openning an issue in your opinion, or if you would prefer using the Discussions tab instead? I don't want to flood you but you warned me about minor quality of life improvements and I'm still adjusting my filter :)!))

Have a nice day

AndraxDev commented 5 months ago

Animations is fixed now.

Furthermore I'm a vanga and I guessed that one of your next issues will touch chat bulk actions so now you can easy manage large amount of chats.

P.S. Still waiting for reward :)

thiswillbeyourgithub commented 5 months ago

Animations is fixed now.

awesome!

I'm a vanga

Had to look it up, learned something new

About the reward I think ko-fi is probably having server-side issues, I tried 3 browsers and I still get an infinite wait: image

I'm really sorry for the wait :/ I'll try again in the evening, and tomorow if it still is an issue. It'll come through I promise :/!

thiswillbeyourgithub commented 5 months ago

Oh it finally worked! Thanks again

AndraxDev commented 5 months ago

Thanks a lot! Now I see it.

thiswillbeyourgithub commented 5 months ago

(Fyi I intended to send the money one time, but my bank apparently tried to send 3 times so if you see 3 donations maybe 2 will get cancelled? If they all go through we'll figure something out but I'll make sure it won't cost you time/energy/ etc. Sorry for the many posts!)

AndraxDev commented 5 months ago

I see only 2 donations so I sent a refund request for the one of it. If you have charged 3 times maybe your bank issued a temporary hold and transaction has failed. Usually holded money will return in 1-2 business days (but may take longer depending on your bank). Regarding refund I can't say how long it will take. Thank you for understanding!

AndraxDev commented 5 months ago

Your refund is on the way!

Screenshot_2024-05-09-18-47-23-428_com.google.android.gm.jpg

Again sorry for inconvience!