EtienneAb3d / WhisperTimeSync

Synchronize Whisper's timestamps over an existing accurate transcription
131 stars 22 forks source link

Highlight and Max line width #13

Open Linch1 opened 1 year ago

Linch1 commented 1 year ago

Hello, Is it possible to highlight the text and give a maximum width to the output? i'm taking as exaple this command from whisper

whisper path/text.wav --word_timestamps True --max_line_width 22 --max_line_count 1 --highlight_words True --output_format srt 

Thanks in advance, and great tool!

EtienneAb3d commented 1 year ago

Hi @Linch1 I may add some parameters. Waiting for that, you can change this file: https://github.com/EtienneAb3d/WhisperHallu/blob/main/transcribeHallu.py Line 431: adjust the options at your need

                if(transcribe_options["word_timestamps"]):
                    srtOpts = { "max_line_width" : 30, "max_line_count" : 1, "highlight_words" : transcribe_options["word_timestamps"]}

Line 441: remove this filtering

                if(transcribe_options["word_timestamps"]):
                    result["text"] = re.sub("(\n[^<\n]*<u>|</u>[^<\n]*\n)"#Remove lines without highlighted words
                                            ,"\n",re.sub(r"\n[^<\n]*\n\n","\n\n"#Keep only highlighted words
                                                         ,result["text"]))

PS: only works with standard Whisper