drikqlis / SubSyncStarter

Post-processing script for Bazarr to start SubSync.
9 stars 4 forks source link

Problem with audio without "definition" #3

Closed Jorman closed 4 years ago

Jorman commented 4 years ago

Sometimes, the track audio from a file, don't have any "definition" I mean is not coded in any way that is English or other language. When this happens, a new parameter have to be used subsync --cli --verbose 3 --window-size 300 --max-point-dist 1 sync --sub file.eng.srt --ref file.mkv **--ref-lang eng** --out aaa.srt --effort 1 So --ref-lang eng make the difference

Do you know if there's a way to take the language file from radarr/sonarr and pass this argument to subsync? Or maybe you know another way to do this

J

drikqlis commented 4 years ago

What do you mean by language file? How does radarr/sonarr know which language is it if it is not stated in the file? I use https://github.com/mdhiggins/sickbeard_mp4_automator to cleanup my files and there is audio-default-language option there, maybe it could be a solution? There is also https://github.com/HaveAGitGat/Tdarr but its fairly new and I haven't tested it yet. Be warned however that changing/encoding/remuxing video files will mess up finding subtitles by hash. I got around it by generating hash file before the conversion but i requires modifying bazarr and the converter script.

Jorman commented 4 years ago

Hi We have to analyze the problem from beginning: The media file that sonarr/radarr download, are made by various codec, to be simple, let's say avi and mkv, so mkv can have more than 2 audio tracks and can have label, like "english audio" and so on. Avi, if I remember well, can have only 2 audio tracks and cannot be labeled. So, happen to me that sonarr downloaded a show with audio not tagged/labeled correctly, so when I run or when the script run this: $ /opt/subsync/bin/subsync --cli --verbose 3 --window-size 300 --max-point-dist 1 sync --sub "/data/Multimedia/Serie Tv/America's Got Talent- The Champions/America's Got Talent - The Champions - 02x05 - The Champions Semi Finals.en.srt" --ref "/data/Multimedia/Serie Tv/America's Got Talent- The Champions/America's Got Talent - The Champions - 02x05 - The Champions Semi Finals.mkv" --out "/data/Multimedia/Serie Tv/America's Got Talent- The Champions/prova.srt" --effort 1 The error is:

[*] starting synchronization /data/Multimedia/Serie Tv/America's Got Talent- The Champions/America's Got Talent - The Champions - 02x05 - The Champions Semi Finals.en.srt
[+] sub: /data/Multimedia/Serie Tv/America's Got Talent- The Champions/America's Got Talent - The Champions - 02x05 - The Champions Semi Finals.en.srt:0/1, type=subtitle/text, lang=eng
[+] ref: /data/Multimedia/Serie Tv/America's Got Talent- The Champions/America's Got Talent - The Champions - 02x05 - The Champions Semi Finals.mkv:1/2, type=audio, fps=23.976023976023978
[+] out: /data/Multimedia/Serie Tv/America's Got Talent- The Champions/prova.srt
[!] select reference language

This because subsync can't read the audio label/tag, so subsync don't know what kind of audio is present, so I need to modify the command like this: $ /opt/subsync/bin/subsync --cli --verbose 3 --window-size 300 --max-point-dist 1 sync --sub "/data/Multimedia/Serie Tv/America's Got Talent- The Champions/America's Got Talent - The Champions - 02x05 - The Champions Semi Finals.en.srt" --ref "/data/Multimedia/Serie Tv/America's Got Talent- The Champions/America's Got Talent - The Champions - 02x05 - The Champions Semi Finals.mkv" --ref-lang eng --out "/data/Multimedia/Serie Tv/America's Got Talent- The Champions/prova.srt" --effort 1 In this way I tell to subsync that the audio is in english.

Maybe exist a better way to do that, but I've this idea: When sonarr/radarr grab and import, they know what kind of language is idea If is possible, the best is to take that info and pass it to the subsync, obviously only when both languages, srt and audio, are the same. Is possible to search sub in a different language but make no sense sync it.

I hope to have better explained this particular problem.

drikqlis commented 4 years ago

I get what you mean however i don't think sonarr/radarr passes language info trough API so it would be tough to get, also we would have to know the sonarr/radarr id of the file to get it, and currently you can't pass it from bazarr.

What about a setting in ini file to set default language when no tag is found by subsync? It would be much simpler to implement and would cover most of the cases.


obviously only when both languages, srt and audio, are the same

It is not necessary, subsync translates audio if needed:

SubSync is listening to the audio track of your movie, using speech recognition engine CMUSphinx to generate text transcription. It is then translated word-by-word using dictionary (for subtitles of different language). Next words are linked with similar words in your subtitles, creating synchronization points. This points are used to fix subtitles time codes.

Jorman commented 4 years ago

I don't know if directly is possible to get it, but here https://github.com/Sonarr/Sonarr/wiki/Release There's the language specification for the language, maybe crossing information with show name, id and other, is possible to make it works, but I don't know exactly.

Default language can be a workaround only when the final user use to always search 1 kind of subtitle.

The best would be to get it from sonarr/radarr, maybe I can try to make some test, I've to figure out where to start, but maybe is possible to make 1 line command to ask sonarr/radarr information, because for Release API command you need to pass episodeId (int) so you need first to pass by https://github.com/Sonarr/Sonarr/wiki/EpisodeFile API and here you need seriesId (int) in order to have a list of all episode in that list, now you've to find what is the episodeId by searching the filename, and so on

What do you think?

Jorman commented 4 years ago

I found a solution, but I don't know how to make it works :)

Basically, for Radarr/Sonarr V3, Bazarr already have this information, inside Movies or Series you can already see the language, column "Audio Language". I'm looking inside get_series.py and get_subtitle.py to see if I find a way to add this to postprocessing phase

Maybe you have more ability than me

Jorman commented 4 years ago

Ok, there's a solution, see here https://github.com/morpheus65535/bazarr/pull/833 So with this integration is possible to pass audio language so: Post-Processing command became python3 /opt/SubSyncStarter/SubSyncStarter.py "{{episode}}" "{{subtitles}}" "{{subtitles_language_code2}}" "{{subtitles_language_code3}}" "{{episode_language_code3}}" 0 and SubSyncStarter.py became audio_code3 = sys.argv[5] [...] command = /usr/bin/nice -n 15 location_subsync + ' --cli --verbose ' + loglevel_subsync + ' --logfile ' + '"' + logfile_subsync + '"' + ' --window-size ' + window_size + ' --max-point-dist ' + max_point_dist + ' sync --sub ' + '"' + sub_file + '"' + ' --ref ' + '"' + reference_file + '"' + ' --ref-lang ' + audio_code3 + ' --out ' + '"' + sub_file + '"' + ' --effort ' + effort + ' --overwrite'

drikqlis commented 4 years ago

That's great, i will update the script once it is merged to bazarr.

Jorman commented 4 years ago

Yep! What about if you make a pull request with your modifications? So, when merged your script will work without make any change and always survive

drikqlis commented 4 years ago

Added audio language, also i rewritten the script to use bazarr API to blacklist subtitles. I created a pull request in bazarr to add necessary variables to custom post-processing.