Cerlancism / chatgpt-subtitle-translator

Efficient translation tool based on ChatGPT API
https://cerlancism.github.io/chatgpt-subtitle-translator/
MIT License
145 stars 16 forks source link
chatgpt cli gui llm openai srt srt-subtitles subtitle translator web

ChatGPT API SRT Subtitle Translator

ChatGPT has also demonstrated its capabilities as a robust translator, capable of handling not just common languages, but also unconventional forms of writing like emojis and word scrambling. However, it may not always produce a deterministic output and adhere to line-to-line correlation, potentially disrupting the timing of subtitles, even when instructed to follow precise instructions and setting the model temperature parameter to 0.

This utility uses the OpenAI ChatGPT API to translate text, with a specific focus on line-based translation, especially for SRT subtitles. The translator optimizes token usage by removing SRT overhead, grouping text into batches, resulting in arbitrary length translations without excessive token consumption while ensuring a one-to-one match between line input and output.

Web Interface: https://cerlancism.github.io/chatgpt-subtitle-translator

Features

Setup

Reference: https://github.com/openai/openai-quickstart-node#setup

Usage: translator [options]

Translation tool based on ChatGPT API

Options:

Additional Options for ChatAPT:

Examples

Plain text

cli/translator.mjs --plain-text "δ½ ε₯½"

Standard Output

Hello.

Emojis

cli/translator.mjs --stream --to "Emojis" --temperature 0 --plain-text "$(curl 'https://api.chucknorris.io/jokes/0ECUwLDTTYSaeFCq6YMa5A' | jq .value)"

Input Argument

Chuck Norris can walk with the animals, talk with the animals; grunt and squeak and squawk with the animals... and the animals, without fail, always say 'yessir Mr. Norris'.

Standard Output

πŸ‘¨β€πŸ¦°πŸ’ͺπŸšΆβ€β™‚οΈπŸ¦œπŸ’πŸ˜πŸ…πŸ†πŸŽπŸ–πŸ„πŸ‘πŸ¦πŸŠπŸ’πŸπŸΏοΈπŸ‡πŸΏοΈβ—οΈπŸŒ³πŸ’¬πŸ˜²πŸ‘‰πŸ€΅πŸ‘¨β€πŸ¦°πŸ‘Š=πŸ•πŸ‘πŸπŸ¦ŒπŸ˜πŸ¦πŸ¦πŸ¦§πŸ¦“πŸ…πŸ¦ŒπŸ¦ŒπŸ¦ŒπŸ†πŸ¦πŸ˜πŸ˜πŸ—πŸ¦“=πŸ‘πŸ€΅.

Scrambling

cli/translator.mjs --stream --system-instruction "Scramble characters of words while only keeping the start and end letter" --no-prefix-number --no-line-matching --temperature 0 --plain-text "Chuck Norris can walk with the animals, talk with the animals;"

Standard Output

Cuhck Nroris can wakl wtih the aiamnls, talk wtih the aiamnls;

Unscrabling

cli/translator.mjs --stream --system-instruction "Unscramble characters back to English" --no-prefix-number --no-line-matching --temperature 0 --plain-text "Cuhck Nroris can wakl wtih the aiamnls, talk wtih the aiamnls;"

Standard Output

Chuck Norris can walk with the animals, talk with the animals;

Plain text file

cli/translator.mjs --stream --temperature 0 --input test/data/test_cn.txt

Input file: test/data/test_cn.txt

δ½ ε₯½γ€‚
ζ‹œζ‹œοΌ

Standard Output

Hello.  
Goodbye!

SRT file

cli/translator.mjs --stream --temperature 0 --input test/data/test_ja_small.srt

Input file: test/data/test_ja_small.srt

1
00:00:00,000 --> 00:00:02,000
γŠγ―γ‚ˆγ†γ”γ–γ„γΎγ™γ€‚

2
00:00:02,000 --> 00:00:05,000
γŠε…ƒζ°—γ§γ™γ‹οΌŸ

3
00:00:05,000 --> 00:00:07,000
はい、元気です。

4
00:00:08,000 --> 00:00:12,000
今ζ—₯γ―ε€©ζ°—γŒγ„γ„γ§γ™γ­γ€‚

5
00:00:12,000 --> 00:00:16,000
はい、とてもいい倩気です。

Output file: test/data/test_ja_small.srt.out_English.srt

1
00:00:00,000 --> 00:00:02,000
Good morning.

2
00:00:02,000 --> 00:00:05,000
How are you?

3
00:00:05,000 --> 00:00:07,000
Yes, I'm doing well.

4
00:00:08,000 --> 00:00:12,000
The weather is nice today, isn't it?

5
00:00:12,000 --> 00:00:16,000
Yes, it's very nice weather.

How it works

Token Reductions

System Instruction
Tokens: 5

Translate Japanese to English
Input Prompt Transform Output
Tokens: `164` Tokens: `83` Tokens: `46` Tokens: `130`
```srt 1 00:00:00,000 --> 00:00:02,000 γŠγ―γ‚ˆγ†γ”γ–γ„γΎγ™γ€‚ 2 00:00:02,000 --> 00:00:05,000 γŠε…ƒζ°—γ§γ™γ‹οΌŸ 3 00:00:05,000 --> 00:00:07,000 はい、元気です。 4 00:00:08,000 --> 00:00:12,000 今ζ—₯γ―ε€©ζ°—γŒγ„γ„γ§γ™γ­γ€‚ 5 00:00:12,000 --> 00:00:16,000 はい、とてもいい倩気です。 ``` ```log 1. γŠγ―γ‚ˆγ†γ”γ–γ„γΎγ™γ€‚ 2. γŠε…ƒζ°—γ§γ™γ‹οΌŸ 3. はい、元気です。 4. 今ζ—₯γ―ε€©ζ°—γŒγ„γ„γ§γ™γ­γ€‚ 5. はい、とてもいい倩気です。 ``` ```log 1. Good morning. 2. How are you? 3. Yes, I'm doing well. 4. The weather is nice today, isn't it? 5. Yes, it's very nice weather. ``` ```srt 1 00:00:00,000 --> 00:00:02,000 Good morning. 2 00:00:02,000 --> 00:00:05,000 How are you? 3 00:00:05,000 --> 00:00:07,000 Yes, I'm doing well. 4 00:00:08,000 --> 00:00:12,000 The weather is nice today, isn't it? 5 00:00:12,000 --> 00:00:16,000 Yes, it's very nice weather. ```

Results

TODO: More analysis

5 SRT lines:
test/data/test_ja_small.srt

30 SRT lines:
test/data/test_ja.srt