McCloudS / subgen

Autogenerate subtitles using OpenAI Whisper Model via Jellyfin, Plex, Emby, Tautulli, or Bazarr
MIT License
570 stars 49 forks source link

Please add some parameters for standardizing/beautifying subtitle layout #68

Closed GOvEy1nw closed 6 months ago

GOvEy1nw commented 6 months ago

Hey, I'm a windows user, and I'm really grateful for Subgen as it's the simplest way to get Whisper running with Bazarr on Windows without having to use Docket etc.

However, one thing I've noticed is that the subtitles aren't formatted the best, due to how Faster-Whisper operates. I've found that the standalone Faster Whisper (https://github.com/Purfview/whisper-standalone-win) has a great optional argument called --standard, which does the following:

--standard: Quick hardcoded preset to split lines in standard way. 42 chars per 2 lines with max_comma_cent=70 and --sentence are activated automatically.

--sentence: Enables splitting lines to sentences for srt and vtt subs. Every sentence starts in the new segment. Be default meant to output whole sentence per line for better translations, but not limited to, read about '--max_...' parameters.

This gives the subtitles a much more standardized look that are common across streaming services such as Netflix, BBC etc.

Is it possible to implement these into SubGen, please?

McCloudS commented 6 months ago

The standalone version doesn’t appear to have any source code so I can’t decipher what’s happening. We use stable-ts, but there are different ways to split the dialogue. See https://github.com/jianfch/stable-ts?tab=readme-ov-file#regrouping-words. Open to any suggestions.

McCloudS commented 6 months ago

I made a separate branch if you want to toy with the idea: https://github.com/McCloudS/subgen/blob/Custom-Params/subgen.py

It takes custom_regroup = os.getenv('CUSTOM_REGROUP', '') Where it is the regroup string as mentioned above. The default ran on the model is cm_sp=,* /,_sg=.5_mg=.3+3_sp=.* /。/?/?

Instructions pasted below:` Regroup (in-place) words into segments.

Parameters
----------
regroup_algo: str or bool, default 'da'
     String representation of a custom regrouping algorithm or ``True`` use to the default algorithm 'da'.
verbose : bool, default False
    Whether to show all the methods and arguments parsed from ``regroup_algo``.
only_show : bool, default False
    Whether to show the all methods and arguments parsed from ``regroup_algo`` without running the methods

Returns
-------
stable_whisper.result.WhisperResult
    The current instance after the changes.

Notes
-----
Syntax for string representation of custom regrouping algorithm.
    Method keys:
        sg: split_by_gap
        sp: split_by_punctuation
        sl: split_by_length
        sd: split_by_duration
        mg: merge_by_gap
        mp: merge_by_punctuation
        ms: merge_all_segment
        cm: clamp_max
        l: lock
        us: unlock_all_segments
        da: default algorithm (cm_sp=,* /,_sg=.5_mg=.3+3_sp=.* /。/?/?)
        rw: remove_word
        rs: remove_segment
        rp: remove_repetition
        rws: remove_words_by_str
        fg: fill_in_gaps
    Metacharacters:
        = separates a method key and its arguments (not used if no argument)
        _ separates method keys (after arguments if there are any)
        + separates arguments for a method key
        / separates an argument into list of strings
        * separates an item in list of strings into a nested list of strings
    Notes:
    -arguments are parsed positionally
    -if no argument is provided, the default ones will be used
    -use 1 or 0 to represent True or False
    Example 1:
        merge_by_gap(.2, 10, lock=True)
        mg=.2+10+++1
        Note: [lock] is the 5th argument hence the 2 missing arguments inbetween the three + before 1
    Example 2:
        split_by_punctuation([('.', ' '), '。', '?', '?'], True)
        sp=.* /。/?/?+1
    Example 3:
        merge_all_segments().split_by_gap(.5).merge_by_gap(.15, 3)
        ms_sg=.5_mg=.15+3`
McCloudS commented 6 months ago

I'm still toying around, but cm_sl=84_sl=42++++++1 does the double lines if the dialog exceeds a certain time. Otherwise, it will still try to find natural breaks.