ggerganov / whisper.cpp

Port of OpenAI's Whisper model in C/C++
MIT License
35.82k stars 3.65k forks source link

Improving subtitles #1451

Open caleidoscopio opened 1 year ago

caleidoscopio commented 1 year ago

When generating subtitles, there's still a great deal of manual work that needs to be done to make the subtitles comply with the most common quality standards (BBC, Netflix). Currently, there are no line breaks implemented (or at least I couldn't find that option), and there are no pauses between each subtitle.

It would be awesome to have more advanced options implemented.

For example, BBC recommends a maximum of 42 characters per line at a minimum duration of 160-180 words per minute, or 0.3 seconds per word. Furthermore, the line breaks should optimally not be added randomly, like after a preposition, for example, but rather after a noun or punctuation. For this, some constituency parser (like spaCy) would probably be necessary.

Would be great to hear about the feasibility of such a thing and if anyone would be interesting in spending some time on it. Unfortunately, I do not possess the technical knowledge.

kevin2379 commented 1 year ago

It will be great once AI can handle tasks like breaking subtitles at the right point, adding pauses, etc. which are required for highly accurate video subtitles. However, I think there will always be some level of human review and editing required for this application.

I've been working on a project that takes the output from whisper.cpp and allows for quick editing of subtitles before exporting to SRT, VTT, TXT or video. Feel free to check it out at aircaption.com and let me know your thoughts.

caleidoscopio commented 1 year ago

I already know of some Whisper implementations that do a great job with the pauses, like Whisper Web-UI , for example. Regarding the line breaks, Amberscript already does a great job, requiring almost no human review. In my experience, it's pretty much 100% accurate, at least for english. The problem is it forces you to use their own transcription technology, which requires more revision and it's a bit expensive. I'm sure the capability to create something like this with whisper already exists. It just needs to be integrated.

shpati commented 1 month ago

When generating subtitles, there's still a great deal of manual work that needs to be done to make the subtitles comply with the most common quality standards (BBC, Netflix). Currently, there are no line breaks implemented (or at least I couldn't find that option), and there are no pauses between each subtitle.

It would be awesome to have more advanced options implemented.

For example, BBC recommends a maximum of 42 characters per line at a minimum duration of 160-180 words per minute, or 0.3 seconds per word. Furthermore, the line breaks should optimally not be added randomly, like after a preposition, for example, but rather after a noun or punctuation. For this, some constituency parser (like spaCy) would probably be necessary.

Would be great to hear about the feasibility of such a thing and if anyone would be interesting in spending some time on it. Unfortunately, I do not possess the technical knowledge.

Try adding the option -ml 84 to limit the total number of characters to 84, i.e. 2 lines x 42 characters / line. It works rather well.

main -pp -ml 84 -osrt filename