Vaibhavs10 / insanely-fast-whisper

Apache License 2.0
7.79k stars 547 forks source link

Add VTT and TXT formats to output converter #179

Closed mjgiarlo closed 9 months ago

mjgiarlo commented 9 months ago

This commit adds the ability to convert the generated JSON output to VTT and TXT, retaining the ability to convert SRT. It now defaults to formatting as SRT for backwards compatibility.

Here's what the tool as currently written generates (with an admittedly simple output.json file).

SRT

1
00:00:00,000 --> 00:00:03,000
 And so my fellow Americans,

2
00:00:03,000 --> 00:00:08,000
 ask not what your country can do for you,

3
00:00:08,000 --> 00:00:11,000
 ask what you can do for your country.

VTT

WEBVTT

1
00:00:00.000 --> 00:00:03.000
 And so my fellow Americans,

2
00:00:03.000 --> 00:00:08.000
 ask not what your country can do for you,

3
00:00:08.000 --> 00:00:11.000
 ask what you can do for your country.

TXT

 And so my fellow Americans,
 ask not what your country can do for you,
 ask what you can do for your country.
matijagrcic commented 9 months ago

@mjgiarlo maybe it would be better to explicitly specify the encoding as utf-8 when opening the JSON file, as utf-8 is a more universal encoding standard that can handle a wider range of characters given the default encoding for the open() function can vary depending on the system.

with open(input_path, 'r', encoding='utf-8')