cnbeining / Whisper_Notebook

A Colab Notebook for OpenAI Whisper and DeepL API, aiming to create human-comparable results of translation and transcription.
GNU General Public License v3.0
22 stars 4 forks source link

Whisper Notebook

This Colab Notebook is designed to support OpenAI Whisper, ctranslate2, wav2vec 2.0, Silero VAD and translation (DeepL) API, aiming to generate ACICFG-opinionated human-comparable results for translation, transcription, and timestamping.

Usage

Open In Colab

Click the button to open Faster Whisper Notebook in Google Colab and follow instructions inside.

Versions

Whisper.ipynb: The first attempt.

WhisperX.ipynb: A custom version of WhisperX has been adopted for better voice activity detection performance.

Faster_Whisper_Public.ipynb: Rewritten with Faster-Whisper with built-in Silero VAD for faster inference.

Technology

This repo utilized the following technologies:

Discussion

Model Size

Certain regression on large model could be observed - this behaviour may be caused by VAD cutoff.

Timestamping

ACICFG employs a very opinionated way of timestamping:

You should adjust those values accordingly.

VAD Model

The author introduced Silero VAD V4 for better performance than pyannote commit 30794f4 - same technology is adopted in Faster-Whisper.

We believe voice activity detection (VAD) provides benefits in several areas:

LLM vs NMT for translation

Quality

The author evaluated the output quality of English-to-Chinese neural machine translation (NMT) systems, specifically DeepL, on aviation-focused materials. Several key observations were made:

Seperator persistence

Subtitle lines are segmented into shorter phrases to provide local context for the model, but these segments must be delimited with separators either within or at the end of lines in order to accommodate screen width constraints.

We have observed that large language models (LLMs) like Claude and GPT tend to ignore separators, even after few-shot learning with temperature set to 0. We summarize the behavior of specific models regarding separator persistence below:

In contrast, neural machine translation (NMT) models tend to persist separators more reliably. We found DeepL may replace separators with 2 new lines, which can be mitigated by extending the separator length.

Therefore, we currently recommend NMT for subtitle translation. We welcome further testing of prompting techniques and modern models.

Potential Issues

Glossary Support

There is no glossary support for Chinese yet as of writing.

API Rate Limiting

API used here is strictly for demo and non-for-profit purpose. Reach out to author privately for further assistance.

Author

This repo is a product of ACI Chinese Fansub Group.

本代码库由ACI字幕组技术部编写。

License

GPL.