Sub-optimal presentation of auto-transcribed subtitles

polsola commented 8 months ago

Hello, thanks for this great work, Tero Subtitler has become one of our most used tools, it's great

I've just noticed a bug, but only on certain videos. When Tero Subtitler would transcribe audio in a video correctly but will generate really long span subtitles, I mean like 20 seconds of text on the same subtitle span

The weird thing is we have a series of videos with the same voice and format, in some cases this happens, and in some others not, I've checked and all have same resolution, FPS, etc

All the videos start with like 5-8 seconds of silence, I don't know if that would affect the result

I've checked, and the settings for subtitle duration is the default at 7000ms

URUWorks commented 8 months ago

Hola, thank you for your words.

I think you are using macOS, you can try the testbuild "https://github.com/URUWorks/additional-files/raw/main/terosubtitler_testbuild_macOS64.zip" where you can choose the maximum of the line.

Another alternative is to use the function to divide, from the "Edit/Entries/Divide entry" menu. I hope it helps you.

polsola commented 8 months ago

Thanks! The option to set a maximum of line is great, It would be awesome if it would avoid cutting words, but that's a great fix for now

chenlung commented 8 months ago

I think a superior algorithm is needed (to match other tools/services — some of which might not be quite there either, incidentally — including EZTitles and Happy Scribe) to account for the following (possibly more):

Linguistic units [not splitting them]
Maximum display time [not exceeding seven seconds]
Characters per line (CPL) count [not exceeding 42]
Characters per second (CPS) count [not exceeding 20]
Shot changes [assuming there is a list imported]
Gapping [two frames between entries]

It could be that some breaches are inevitable and will require user intervention after processing, but some simple improvements would be desirable.

See this.

chenlung commented 1 month ago

Requested elsewhere: #323

cuentacuentoscaminantes commented 1 month ago

some guidelines to designe future whisper auto transcribed subtitling structure:

The whisper voice-to-text transcripcion is really excellent. Word level detection is almost perfect. Timing is good to. But there is no reader-perspective in lines division. Human made subtitles contain some considerations in line/subtitle divisions

I would like Tero´s functionality of Voice-to-Text transcription, to consider in the creation of subtitles, some considerations regardiong the division of lines and subtitles, besides not exceeding x number of characters (which it effectebly does)

These considerations could be instructed to the transcriptor: a) Silences and interpretative pauses (mybe + 0,5 seconds of no words) should be the place to divide lines or subtitles (regardless the length of the line/subtitle) b) grammatical pauses and punctuation marks should be the place to divide lines or subtitles (regardless the length of the line/subtitle) c) Write conjunctions and connections on the bottom line. (for example "and" "or", etc.) d) Do not separate nominal, verbal and prepositional phrases into lines (for example: imagen

Copio la norma UNE 153010 que es útil para comprender y implementar esto. UNE_153010_2012.pdf https://github.com/user-attachments/files/16770767/UNE_153010_2012.pdf It would be great for the transcriptor to identify actors, utilizing diarization functionality of whisper. Great work!! Thanks! PS: I offer myself as subtitling consultant, as I am working with Tero regularly, with industry standards.

URUWorks / TeroSubtitler

Sub-optimal presentation of auto-transcribed subtitles #256