Audionut / Upload-Assistant

A fork of L4G attempting to keep updated
https://github.com/L4GSP1KE/Upload-Assistant
50 stars 15 forks source link

Dual-Audio not able to be identified effectively. #53

Open Kha-kis opened 5 days ago

Kha-kis commented 5 days ago

I have identified issues with dual-audio identification

the 1st issue I have run into is related to English identification in prep.py

The orginal code is:

                        # Check for English Language Track
                        if audio_language == "en" and "commentary" not in t.get('Title', '').lower():
                            eng = True

in order to debug I added the following

print(f"eng: {eng}, orig: {orig}")

For a file with an en-US audio track this was the output

eng: False, orig: True

I added the following code in order to remediate:

                        # Check for English Language Track
                        english_variants = ["en", "en-US", "en-CA", "en-GB", "en-AU", "en-NZ"]
                        if audio_language in english_variants and "commentary" not in t.get('Title', '').lower():
                            eng = True

Another solution could be to use

if audio_language.startswith("en") and "commentary" not in t.get('Title', '').lower():

However I am not certian on ammount of false positives if any could occur.

the 2nd issue is almost the same as the 1st however it is in relation to orig.

This is where the orig audio has a correct region specified.

as a temp work arround for the languages I have worked with I have added them under the variants section:

                        # Catch Chinese / Norwegian / Spanish variants
                        variants = ['zh', 'cn', 'cmn', 'no', 'nb', 'es-419', 'es-ES', 'es']
                        if audio_language in variants and meta['original_language'] in variants:
                            orig = True

However there should be an easier identification method I have not solved for yet following

Audionut commented 4 days ago

My first silly thought is to refactor the mi meta, so that it's consistent with mediainfo.txt. For instance, I couldn't understand why the language here https://github.com/Audionut/Upload-Assistant/commit/ca5b3d773c53f59983d696cd504ff8745214087b seemingly started referring to the 2 character designation (fixed properly here https://github.com/Audionut/Upload-Assistant/commit/b7bfcf1e0c5ed2702c4998efc9d7223a96e84563), and so I just referred to txt instead of mi meta as the fix. See /forums/topics/1349/posts/26436 at ATH

That would be no small change though.

1st issue with 1st solution seems fine. If there's some other en variant, it can be easily added without the worry of false positives.

I need to spend some more time with the second issue. It's seems logical that if the meta is marked original, then it's original, period, and I don't understand why it needs the additional language check when the English check seems to catch that already.

Audionut commented 4 days ago

It seems like the first original check is probably what's triggering the issue #51 bug, and rather than adding a gazillion non-english variants, it's probably best to just use is not english_variants

Kha-kis commented 2 days ago

I have been doing further testing. Reviewing the language codes I was unable to identify any chances for duplicates if using starts with. https://en.wikipedia.org/wiki/List_of_ISO_639_language_codes

using that logic I was able to rewrite the dubbed and dual-audio sections.

Please let me know if this fits your logic stream and if so can we commit this?

if meta.get('original_language', '') != 'en':
    eng, orig = False, False
    try:
        for t in mi.get('media', {}).get('track', []):
            if t.get('@type') != "Audio":
                continue

            audio_language = t.get('Language', '')

            # Check for English Language Track
            if audio_language.startswith("en") and "commentary" not in t.get('Title', '').lower():
                eng = True

            # Check for original Language Track (non-English) with region tag flexibility
            if not audio_language.startswith("en") and audio_language.startswith(meta['original_language']) and "commentary" not in t.get('Title', '').lower():
                orig = True

            # Catch Chinese / Norwegian variants with region tag flexibility
            variants = ['zh', 'cn', 'cmn', 'no', 'nb']
            if any(audio_language.startswith(var) for var in variants) and any(meta['original_language'].startswith(var) for var in variants):
                orig = True

            # Check for additional, potentially bloated tracks
            if audio_language != meta['original_language'] and not audio_language.startswith("en"):
                # If audio_language is empty, set to 'und' (undefined)
                audio_language = "und" if audio_language == "" else audio_language
                console.print(f"[bold red]This release has a(n) {audio_language} audio track, and may be considered bloated")
                time.sleep(5)

        print(f"eng: {eng}, orig: {orig}")

        # Determine if the release is Dual-Audio or Dubbed
        if eng and orig:
            dual = "Dual-Audio"
        elif eng and not orig and meta['original_language'] not in ['zxx', 'xx', None] and not meta.get('no_dub', False):
            dual = "Dubbed"
    except Exception:
        console.print(traceback.format_exc())
        pass
Audionut commented 2 days ago

That looks good at first glance, I can't recall what file I had was triggering the dual-audio bug. @backstab5983 do you have a filename handy that triggers this bug?

Audionut commented 2 days ago

Did you forget the english_variants, and the other variants you added @Kha-kis or are they not needed any longer?

Kha-kis commented 2 days ago

They are no longer needed as I am using if audio_language.startswith("en") and audio_language.startswith(meta['original_language']) the only variants needed are Chinese and Norwegian as they have multiple ISO codes.

Kha-kis commented 2 days ago

There is an edge case where dual audio can not be identified if original audio is incorrect on tmdb (https://www.themoviedb.org/tv/110382-pachinko for example should be Korean). There is no current way to name correctly.

In order to remediate this issue, The following changes can be made.

in args.py add the additional arg of --dual-audio

        parser.add_argument('--dual-audio', dest='dual_audio', action='store_true', required=False, help="Add Dual-Audio to the title")

in upload.py update the overwrite_list

                    overwrite_list = [
                        'trackers', 'dupe', 'debug', 'anon', 'category', 'type', 'screens', 'nohash', 'manual_edition', 'imdb', 'tmdb_manual', 'mal', 'manu>                        'hdb', 'ptp', 'blu', 'no_season', 'no_aka', 'no_year', 'no_dub', 'no_tag', 'no_seed', 'client', 'desclink', 'descfile', 'desc', 'dr><host', 'manual_source', 'webdv', 'hardcoded-subs', 'dual_audio'                                                                                                                ]

finally, in prep.py update the dual logic.

            if meta.get('dual_audio', False):  # If dual_audio flag is set, skip other checks
                dual = "Dual-Audio"

            else:
                if meta.get('original_language', '') != 'en':
                    eng, orig = False, False
                    try:
                        for t in mi.get('media', {}).get('track', []):
                            if t.get('@type') != "Audio":
                                continue

                            audio_language = t.get('Language', '')

                            # Check for English Language Track
                            if audio_language.startswith("en") and "commentary" not in t.get('Title', '').lower():
                                eng = True

                            # Check for original Language Track
                            if not audio_language.startswith("en") and audio_language.startswith(meta['original_language']) and "commentary" not in t.get('>                                orig = True

                            # Catch Chinese / Norwegian Variants
                            variants = ['zh', 'cn', 'cmn', 'no', 'nb']
                            if any(audio_language.startswith(var) for var in variants) and any(meta['original_language'].startswith(var) for var in variant>                                orig = True

                            # Check for additional, bloated Tracks
                            if audio_language != meta['original_language'] and not audio_language.startswith("en"):
                                # If audio_language is empty, set to 'und' (undefined)
                                audio_language = "und" if audio_language == "" else audio_language
                                console.print(f"[bold red]This release has a(n) {audio_language} audio track, and may be considered bloated")
                                time.sleep(5)

                        if eng and orig:
                            dual = "Dual-Audio"
                        elif eng and not orig and meta['original_language'] not in ['zxx', 'xx', None] and not meta.get('no_dub', False):
                            dual = "Dubbed"
                    except Exception:
                        console.print(traceback.format_exc())
                        pass

Please perform any testing needed and if all is well I will submit a pr.

Audionut commented 13 hours ago

Apologies for the delay, it all seems to be working fine here. Not sure what happened with your copy/paste,

            if meta.get('dual_audio', False):  # If dual_audio flag is set, skip other checks
                dual = "Dual-Audio"
            else:
                if meta.get('original_language', '') != 'en':
                    eng, orig = False, False
                    try:
                        for t in mi.get('media', {}).get('track', []):
                            if t.get('@type') != "Audio":
                                continue

                            audio_language = t.get('Language', '')

                            # Check for English Language Track
                            if audio_language.startswith("en") and "commentary" not in t.get('Title', '').lower():
                                eng = True

                            # Check for original Language Track
                            if not audio_language.startswith("en") and audio_language.startswith(meta['original_language']) and "commentary" not in t.get('Title', '').lower():
                                orig = True

                            # Catch Chinese / Norwegian Variants
                            variants = ['zh', 'cn', 'cmn', 'no', 'nb']
                            if any(audio_language.startswith(var) for var in variants) and any(meta['original_language'].startswith(var) for var in variants):
                                orig = True

                            # Check for additional, bloated Tracks
                            if audio_language != meta['original_language'] and not audio_language.startswith("en"):
                                # If audio_language is empty, set to 'und' (undefined)
                                audio_language = "und" if audio_language == "" else audio_language
                                console.print(f"[bold red]This release has a(n) {audio_language} audio track, and may be considered bloated")
                                time.sleep(5)

                        if eng and orig:
                            dual = "Dual-Audio"
                        elif eng and not orig and meta['original_language'] not in ['zxx', 'xx', None] and not meta.get('no_dub', False):
                            dual = "Dubbed"
                    except Exception:
                        console.print(traceback.format_exc())
                        pass