chidiwilliams / buzz

Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.
https://chidiwilliams.github.io/buzz
MIT License
11.95k stars 899 forks source link

Feature Request: Separate Proxy APIs for Translation and Transcription, or Allow Using Different APIs for Translating Transcribed Text #859

Closed oldboss918 closed 2 weeks ago

oldboss918 commented 1 month ago

The current version of BUZZ only allows using a single proxy API, which is bound to both the AI transcription and translation functions once selected. This means that it's not possible to switch to a different proxy API for translation.

Could you please consider adding the ability to set different proxy APIs for different functions? Alternatively, could you allow users to select a separate proxy API to operate on the transcribed audio text once the transcription is complete?

Having this flexibility would be highly beneficial, as it would allow users to choose the most suitable proxy API for each specific task - transcription and translation. This can lead to improved performance, better results, and a more customized user experience.

Thank you for considering this feature request. Please let me know if you need any further clarification or have any questions.

PS:when i use fast whisper, Failed (Unknown error) happend
GPU Memory:8180 MB

oldboss918 commented 1 month ago

model_path='C:\Users\YD\AppData\Local\Buzz\Buzz\Cache\models\models--guillaumekln--faster-whisper-medium\snapshots\8701f851d407f3f47e091bb13b8dac5290c7f7fb', id=10619793, uid=UUID('7cede170-d258-45a5-9206-51172f8d6523'), segments=[], status=None, error=None, queued_at=None, started_at=None, completed_at=None, output_directory=None, source=<Source.FILE_IMPORT: 'file_import'>, file_path='G:/打轴/GHMT05/02/03/03.MP3', url=None, fraction_downloaded=0.0) [2024-07-27 23:23:46,608] whisper_file_transcriber.transcribe:71 DEBUG -> whisper process completed with code = 3221226505, time taken = 0:00:15.276752, number of segments = 0 [2024-07-27 23:23:46,608] file_transcriber.run:61 ERROR -> Traceback (most recent call last): File "buzz\transcriber\file_transcriber.py", line 59, in run File "buzz\transcriber\whisper_file_transcriber.py", line 80, in transcribe Exception: Unknown error [2024-07-27 23:23:46,608] file_transcriber_queue_worker.run:40 DEBUG -> Waiting for next transcription task

raivisdejus commented 1 month ago

Regarding Faster whisper, please, select a language, or try yet unreleased the version from here https://github.com/chidiwilliams/buzz/actions/runs/9951485492 I think you may be getting error where "Detect language" was not working

If you recognize the speech first and then change the API you will be able to translate with different AI, but I agree that having separate API urls would give more flexibility

raivisdejus commented 1 month ago

One benefit in having a single API url is that it makes the app simpler to new users, but it would also be good to give extra flexibility for more advanced users who want it. One option to achieve this could be to introduce and document environmental variables or app flags that give the extra flexibility without adding new fields in the settings. This way we could expose many more settings to user control.

For example if you run the app with extra flag or add it to the app launch shortcut.

TRANSCRIBE_API_BASEURL=https://api.openai.com/v1 TRANSLATE_API_BASEURL=https://api.groq.com/openai/v1 python -m buzz

or

set "TRANSCRIBE_API_BASEURL=https://api.openai.com/v1" && set "TRANSLATE_API_BASEURL=https://api.groq.com/openai/v1" && "C:\Program Files (x86)\Buzz\Buzz.exe"

Once implemented this would allow to set different API base urls for transcription and translation.

Things we could expose to user choice this way:

What does community think of this approach?

raivisdejus commented 2 weeks ago

Ability to set custom API URL and key for translation has been added via Advanced preferences (environment variables). Please see https://chidiwilliams.github.io/buzz/docs/preferences#advanced-preferences

Available in latest development builds from here https://github.com/chidiwilliams/buzz/actions/workflows/ci.yml?query=branch%3Amain