Dadangdut33 / Speech-Translate

A realtime speech transcription and translation application using Whisper OpenAI and free translation API. Interface made using Tkinter. Code written fully in Python.
MIT License
500 stars 60 forks source link
python speech-transcription speech-translation tkinter-python translate whisper

Speech Translate Logo

Speech Translate

GitHub issues GitHub pull requests github downloads GitHub release (latest SemVer) GitHub commits since latest release (by date) GitHub commits difference between master and dev branch
GitHub Repo stars GitHub forks

Speech Translate is a practical application that combines OpenAI's Whisper ASR model with free translation APIs. It serves as a versatile tool for both real-time / live speech-to-text and speech translation, allowing the user to seamlessly convert spoken language into written text. Additionally, it has the option to import and transcribe audio / video files effortlessly.

Speech Translate aims to expand whisper ability by combining it with some translation APIs while also providing a simple and easy to use interface to create a more practical application. This application is also open source, so you can contribute to this project if you want to.

Speech Translate Preview

Preview - Usage

Record File import File import in progress Align result Refine result Translate Result Transcribe mode on subtitle window (English)
Transcribe mode on detached window (English) Translate mode on subtitle window (English to Indonesia)
Translate mode on detached window (English to Indonesia)

Preview - Setting

Setting - General Setting - Record Setting - Whisper Setting - File Export Setting - Translate Setting - Textbox


Table Of Contents

πŸš€ Features

πŸ“œ Requirements

OS Installation from Prebuilt binary Installation as a Module Installation from Git
Windows βœ”οΈ βœ”οΈ βœ”οΈ
MacOS ❌ βœ”οΈ βœ”οΈ
Linux ❌ βœ”οΈ βœ”οΈ

* Python 3.8 or later (3.11 is recommended) for installation as module.

Size Parameters Required VRAM Relative speed
tiny 39 M ~1 GB ~32x
base 74 M ~1 GB ~16x
small 244 M ~2 GB ~6x
medium 769 M ~5 GB ~2x
large 1550 M ~10 GB 1x

* This information is also available in the app (hover over the model selection in the app and there will be a tooltip about the model info). Also note that when using faster-whisper, the model speed will be significantly faster and have smaller vram usage, for more information about this please visit faster-whisper repository

πŸ”§ Installation

[!IMPORTANT]
Please take a look at the Requirements first before installing. For more information about the usage of the app, please check the wiki

From Prebuilt Binary (.exe)

[!NOTE]
The prebuilt binary is shipped with CUDA 11.8, so it will only work with GPU that has CUDA 11.8 compatibility. If your GPU is not compatible, you can try installation as module or from git below.

  1. Download the latest release (There are 2 versions, CPU and GPU/CUDA)
  2. Install/extract the downloaded file
  3. Run the program
  4. Set the settings to your liking
  5. Enjoy!

As A Module

[!NOTE]
Use python 3.11 for best compatibility and performance

[!WARNING]
You might need to have Build tools for Visual Studio (or the equivalent of it on your OS) installed

To install as module, we can use pip, with the following command.

You can then run the program by typing speech-translate in your terminal/console. Alternatively, when installing as a module, you can also clone the repo and install it locally by running pip install -e . in the project directory. (Don't forget to add --extra-index-url if you want to install with GPU support)

Notes For Installation as Module:

From Git

If you prefer cloning the app directly from git/github, you can follow the guide in development (wiki) or below. Doing it this way might also provide a more stable environment.

πŸ“š More Information

Check out the wiki for more information about the app, user settings, how to use it, and more.

πŸ› οΈ Development

[!NOTE]
Check the wiki for more details

Setup

[!NOTE]
It is recommended to create a virtual environment, but it is not required. I also use python 3.11.6 for development, but it should work with python 3.8 or later

[!WARNING]
You might need to have Build tools for Visual Studio installed

  1. Clone the repo with its submodules by running git clone --recurse-submodules https://github.com/Dadangdut33/Speech-Translate.git
  2. cd into the project directory
  3. Create a virtual environment by running python -m venv venv
  4. Activate your virtual environment
  5. Install all the dependencies needed by running pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu118 if you are using GPU or pip install -r requirements.txt if you are using CPU.
  6. Run python Run.py in root directory to run the app.

Notes:

Running the app

You can run the app by running the Run.py located in root directory. Alternatively you can also run it using python -m speech_translate in the root directory.

Building

Before compiling the project, make sure you have installed all the dependencies and setup your pytorch correctly. Your pytorch version will control wether the app will use GPU or CPU (that's why it's recommended to make virtual environment for the project).

The pre compiled version in this project is built using cx_freeze, we have provided the script in build.py. This build script is only configured for windows build at the moment, but feel free to contribute if you know how to build properly for other OS.

To compile it into an exe run python build.py build_exe in the root directory. This will produce a folder containing the compiled project alongside an executable in the build directory. After that, use innosetup script to create an installer. You can use the provided installer.iss to create the installer.

Compatibility

This project should be compatible with Windows (preferrably windows 10 or later) and other platforms. But I haven't tested it extensively on other platforms. If you find any bugs or issues, feel free to create an issue.

πŸ’‘ Contributing

Feel free to contribute to this project by forking the repository, making your changes, and submitting a pull request. You can also contribute by creating an issue if you find a bug or have a feature request. Also, feel free to give this project a star if you like it.

License

This project is licensed under the MIT License - see the LICENSE file for details

Attribution

Other

Check out my other similar project called Screen Translate a screen translator / OCR tools made possible using tesseract.