McCloudS / subgen

Autogenerate subtitles using OpenAI Whisper Model via Jellyfin, Plex, Emby, Tautulli, or Bazarr
MIT License
643 stars 56 forks source link

Select audio/ Prefer language #143

Open muisje opened 3 days ago

muisje commented 3 days ago

So let's say there's a video with audio in multiple languages. And i would only be interested in one language. In my case Spanish. So i would like subgen to choose the Spanish audio track if available. (If not just skip it).

What happens is that subgen takes the English audio track. And then doesn't generate any subtitles because i configured it to skip English.

> mkvinfo movie.mkv

|+ Tracks
| + Track
|  + Track number: 1 (track ID for mkvmerge & mkvextract: 0)
|  + Track UID: 14974003125132585736
|  + Track type: video
|  + "Lacing" flag: 0
|  + Language: spa
|  + Codec ID: V_MPEG4/ISO/AVC
|  + Codec's private data: size 46 (H.264 profile: High @L4.1)
|  + Default duration: 00:00:00.041708333 (23.976 frames/fields per second for a video track)
|  + Language (IETF BCP 47): es
|  + Video track
|   + Pixel width: 1920
|   + Pixel height: 802
|   + Display width: 1920
|   + Display height: 802
|   + Video color information
|    + Color matrix coefficients: 1
|    + Horizontal chroma siting: 1
|    + Vertical chroma siting: 2
|    + Color range: 1
|    + Color transfer: 1
|    + Color primaries: 1
| + Track
|  + Track number: 2 (track ID for mkvmerge & mkvextract: 1)
|  + Track UID: 7766254991514104971
|  + Track type: audio
|  + "Forced display" flag: 1
|  + Language: spa
|  + Codec ID: A_AC3
|  + Default duration: 00:00:00.032000000 (31.250 frames/fields per second for a video track)
|  + Language (IETF BCP 47): es
|  + Audio track
|   + Sampling frequency: 48000
|   + Channels: 6
| + Track
|  + Track number: 3 (track ID for mkvmerge & mkvextract: 2)
|  + Track UID: 10288278466344406720
|  + Track type: audio
|  + "Default track" flag: 0
|  + Codec ID: A_AC3
|  + Default duration: 00:00:00.032000000 (31.250 frames/fields per second for a video track)
|  + Language (IETF BCP 47): en
|  + Audio track
|   + Sampling frequency: 48000
|   + Channels: 6
| + Track
|  + Track number: 4 (track ID for mkvmerge & mkvextract: 3)
|  + Track UID: 4079945488710420686
|  + Track type: audio
|  + "Default track" flag: 0
|  + Codec ID: A_DTS
|  + Default duration: 00:00:00.010666667 (93.750 frames/fields per second for a video track)
|  + Language (IETF BCP 47): en
|  + Audio track
|   + Sampling frequency: 48000
|   + Channels: 6
|   + Bit depth: 24

output when it get's triggered by a jellyfin webhook:

Language 'eng' detected in movie.mkv and is in the skip list ['eng'], skipping subtitle generation

Which results in no subtitles even though there's a spanish audio track that would like to be transcribed.

With these settings i would imagine nothing would be skipped, because nothing will be English. But i guess forced detect language will not be used when checking the language with the sample.

My proposal is:

With these options the users of subgen i would think make sure they get the right subtitles transcribed.

McCloudS commented 3 days ago

See https://www.reddit.com/r/selfhosted/s/Z5wLmXq3qu for a semi-related discussion. The default behavior of stable-ts and pyav is dictating which audio track is picked, I haven’t done any additional work or thought what it might look like to sift through the streams and encode individual languages. I’m not even sure if it’s a problem worth solving.

I’ll think on it.

McCloudS commented 3 days ago

I believe bazarr is also a workaround for this as it tracks the languages of audio and subtitles.

muisje commented 12 hours ago

I didn't have the issue in mentioned in the reddit comment, but i guess this would solve according to that theory. It now extracts one audio track if it has multiple in the pull request #144 . Bazarr didn't work well because it was using detect language which is not totally reliable and lacks control. I think my pull request solves the issue. I tested it with TRANSCRIBE_FOLDERS set. And the folder with a couple files of http://cdn.media.ccc.de/congress/2019/h264-hd/ which have videos with multiple and single audio tracks.