Tyrrrz / YoutubeDownloader

Downloads videos and playlists from YouTube
MIT License
9.26k stars 1.25k forks source link

Subtitle language information not specified #387

Closed BrendanxP closed 1 year ago

BrendanxP commented 1 year ago

Version

v1.10.4

Platform

Windows 11

Steps to reproduce

Playlist URL: https://www.youtube.com/watch?v=7_mVFTQX0xs&list=PL2HLJ87twWI1DhyFxYbQRB4_JLKPIYXli

Settings.dat

{
  "IsUkraineSupportMessageEnabled": false,
  "IsAutoUpdateEnabled": true,
  "IsDarkModeEnabled": true,
  "IsAuthPersisted": true,
  "ShouldInjectTags": true,
  "ShouldSkipExistingFiles": false,
  "FileNameTemplate": "$title",
  "ParallelLimit": 2,
  "LastAppVersion": null,
  "LastAuthCookies": null,
  "LastContainer": {
    "Name": "mp4"
  },
  "LastVideoQualityPreference": 4
}

Details

The downloaded MP4s from the playlist contain about 7 sets of subtitles. When imported in Plex, I see 7 times: Unkown (MOV_TEXT). This makes it difficult to select the subtitles I need. I would expect at least one of them to be English (SOME_NAME). Where SOME_NAME could also be ENG for example.

I expect this not to be a bug from Plex, as other subtitles do work normally. Unfortunately, I did not find another way besides Plex to accurately check the subtitle names.

I would like to know if other people also have this issue, if this is expected behaviour or otherwise, whether it can maybe be fixed?

Checklist

Tyrrrz commented 1 year ago

I'm not sure what Plex is, but I have tested subs in VLC and others players before and it seems to display the metadata correctly. Can you test in a different app?

BrendanxP commented 1 year ago

I'm not sure what Plex is, but I have tested subs in VLC and others players before and it seems to display the metadata correctly. Can you test in a different app?

Before posting I tried it in the default Windows Media Player, which did not find any subtitles, but that's an issue from the Windows app (lol). I just downloaded VLC to check and indeed, as you mentioned, they show correctly in there.

I just downloaded MediaInfo, where I can view all the data about audio and subtitles under "View">"Text". I will add the output of MediaInfo down below. I can see that the subtitles get tagged with Title: LANGUAGE, but not with Language: LANGUAGE. Whereas the Audio part does contain the "Language" tag.

In this unrelated post, I did also see the Language tag being used for text subtitles. Therefore, I believe that the lack of the Language info tag might be the issue with Plex, and possibly other apps, not recognizing the language of the texts.

I am not sure if this is something that can be added? It should basically contain the same value as the Title tag.

General
Complete name                            : X:\Series\Asian BTS SKZ\SKZ CODE - Youtube\SKZ 사우나 (SKZ SAUNA) #1|[SKZ CODE] Ep.37.mp4
Format                                   : MPEG-4
Format profile                           : Base Media
Codec ID                                 : isom (isom/av01/iso2/mp41)
File size                                : 925 MiB
Duration                                 : 21 min 34 s
Overall bit rate mode                    : Variable
Overall bit rate                         : 5 998 kb/s
Frame rate                               : 29.970 FPS
Writing application                      : Lavf60.3.100
Cover                                    : Yes
Comment                                  : Downloaded using YoutubeDownloader (https://github.com/Tyrrrz/YoutubeDownloader) / Video: SKZ 사우나 (SKZ SAUNA) #1|[SKZ CODE] Ep.37 / Video URL: https://www.youtube.com/watch?v=bC3rOHEHcb4&list=PL2HLJ87twWI1DhyFxYbQRB4_JLKPIYXli / Channel: Stray Kids / Channel URL: https://www.youtube.com/channel/UC9rMiEjNaCSsebs31MRDCRA
dtag                                     : 2023-11-06 14:29:11

Video
ID                                       : 1
Format                                   : AV1
Format/Info                              : AOMedia Video 1
Format profile                           : Main@L5.0
Codec ID                                 : av01
Duration                                 : 21 min 34 s
Bit rate                                 : 5 861 kb/s
Width                                    : 3 840 pixels
Height                                   : 2 160 pixels
Display aspect ratio                     : 16:9
Frame rate mode                          : Constant
Frame rate                               : 29.970 (30000/1001) FPS
Color space                              : YUV
Chroma subsampling                       : 4:2:0
Bit depth                                : 8 bits
Bits/(Pixel*Frame)                       : 0.024
Stream size                              : 904 MiB (98%)
Title                                    : ISO Media file produced by Google Inc. / 2160p | 9,14 Mbit/s
Color range                              : Limited
Color primaries                          : BT.709
Transfer characteristics                 : BT.709
Matrix coefficients                      : BT.709
Codec configuration box                  : av1C

Audio
ID                                       : 2
Format                                   : AAC LC
Format/Info                              : Advanced Audio Codec Low Complexity
Codec ID                                 : mp4a-40-2
Duration                                 : 21 min 34 s
Bit rate mode                            : Constant
Bit rate                                 : 128 kb/s
Channel(s)                               : 2 channels
Channel layout                           : L R
Sampling rate                            : 44.1 kHz
Frame rate                               : 43.066 FPS (1024 SPF)
Compression mode                         : Lossy
Stream size                              : 19.7 MiB (2%)
Title                                    : ISO Media file produced by Google Inc. / 128,45 Kbit/s
Language                                 : English
Default                                  : Yes
Alternate group                          : 1

Text #1
ID                                       : 3
Format                                   : Timed Text
Muxing mode                              : sbtl
Codec ID                                 : tx3g
Duration                                 : 21 min 27 s
Bit rate mode                            : Variable
Bit rate                                 : 195 b/s
Frame rate                               : 1.110 FPS
Stream size                              : 30.7 KiB (0%)
Title                                    : Chinese
Default                                  : Yes
Forced                                   : No
Alternate group                          : 3
Count of events                          : 716

Text #2
ID                                       : 4
Format                                   : Timed Text
Muxing mode                              : sbtl
Codec ID                                 : tx3g
Duration                                 : 21 min 27 s
Bit rate mode                            : Variable
Bit rate                                 : 190 b/s
Frame rate                               : 1.110 FPS
Stream size                              : 29.9 KiB (0%)
Title                                    : English
Default                                  : No
Forced                                   : No
Alternate group                          : 3
Count of events                          : 716

__ Edit: I looked further and "Language", is indeed part of the so called known parameters for text sources (subtitles), according to MediaInfo.

This is the list of known parameters for subtitles/text:

Text 
Count 
Status 
StreamCount 
StreamKind 
StreamKind/String 
StreamKindID 
StreamKindPos 
StreamOrder 
FirstPacketOrder 
Inform 
ID 
ID/String 
OriginalSourceMedium_ID 
OriginalSourceMedium_ID/String 
UniqueID 
UniqueID/String 
MenuID 
MenuID/String 
Format 
Format/String 
Format/Info 
Format/Url 
Format_Commercial 
Format_Commercial_IfAny 
Format_Version 
Format_Profile 
Format_Compression 
Format_Settings 
Format_Settings_Wrapping 
Format_AdditionalFeatures 
InternetMediaType 
MuxingMode 
MuxingMode_MoreInfo 
CodecID 
CodecID/String 
CodecID/Info 
CodecID/Hint 
CodecID/Url 
CodecID_Description 
Codec                     : Deprecated
Codec/String              : Deprecated
Codec/Info                : Deprecated
Codec/Url                 : Deprecated
Codec/CC                  : Deprecated
Duration 
Duration/String 
Duration/String1 
Duration/String2 
Duration/String3 
Duration/String4 
Duration/String5 
Duration_Start2End 
Duration_Start2End/String 
Duration_Start2End/String1 
Duration_Start2End/String2 
Duration_Start2End/String3 
Duration_Start2End/String4 
Duration_Start2End/String5 
Duration_Start_Command 
Duration_Start_Command/String 
Duration_Start_Command/String1 
Duration_Start_Command/String2 
Duration_Start_Command/String3 
Duration_Start_Command/String4 
Duration_Start_Command/String5 
Duration_Start 
Duration_Start/String 
Duration_Start/String1 
Duration_Start/String2 
Duration_Start/String3 
Duration_Start/String4 
Duration_Start/String5 
Duration_End 
Duration_End/String 
Duration_End/String1 
Duration_End/String2 
Duration_End/String3 
Duration_End/String4 
Duration_End/String5 
Duration_End_Command 
Duration_End_Command/String 
Duration_End_Command/String1 
Duration_End_Command/String2 
Duration_End_Command/String3 
Duration_End_Command/String4 
Duration_End_Command/String5 
Duration_FirstFrame 
Duration_FirstFrame/String 
Duration_FirstFrame/String1 
Duration_FirstFrame/String2 
Duration_FirstFrame/String3 
Duration_FirstFrame/String4 
Duration_FirstFrame/String5 
Duration_LastFrame 
Duration_LastFrame/String 
Duration_LastFrame/String1 
Duration_LastFrame/String2 
Duration_LastFrame/String3 
Duration_LastFrame/String4 
Duration_LastFrame/String5 
Duration_Base 
Source_Duration 
Source_Duration/String 
Source_Duration/String1 
Source_Duration/String2 
Source_Duration/String3 
Source_Duration/String4 
Source_Duration/String5 
Source_Duration_FirstFrame 
Source_Duration_FirstFrame/String 
Source_Duration_FirstFrame/String1 
Source_Duration_FirstFrame/String2 
Source_Duration_FirstFrame/String3 
Source_Duration_FirstFrame/String4 
Source_Duration_FirstFrame/String5 
Source_Duration_LastFrame 
Source_Duration_LastFrame/String 
Source_Duration_LastFrame/String1 
Source_Duration_LastFrame/String2 
Source_Duration_LastFrame/String3 
Source_Duration_LastFrame/String4 
Source_Duration_LastFrame/String5 
BitRate_Mode 
BitRate_Mode/String 
BitRate 
BitRate/String 
BitRate_Minimum 
BitRate_Minimum/String 
BitRate_Nominal 
BitRate_Nominal/String 
BitRate_Maximum 
BitRate_Maximum/String 
BitRate_Encoded 
BitRate_Encoded/String 
Width 
Width/String 
Height 
Height/String 
DisplayAspectRatio 
DisplayAspectRatio/String 
DisplayAspectRatio_Original 
DisplayAspectRatio_Original/String 
FrameRate_Mode 
FrameRate_Mode/String 
FrameRate_Mode_Original 
FrameRate_Mode_Original/String 
FrameRate 
FrameRate/String 
FrameRate_Num 
FrameRate_Den 
FrameRate_Minimum 
FrameRate_Minimum/String 
FrameRate_Nominal 
FrameRate_Nominal/String 
FrameRate_Maximum 
FrameRate_Maximum/String 
FrameRate_Original 
FrameRate_Original/String 
FrameRate_Original_Num 
FrameRate_Original_Den 
FrameCount 
ElementCount 
Source_FrameCount 
ColorSpace 
ChromaSubsampling 
Resolution                : Deprecated
Resolution/String         : Deprecated
BitDepth 
BitDepth/String 
Compression_Mode 
Compression_Mode/String 
Compression_Ratio 
Delay 
Delay/String 
Delay/String1 
Delay/String2 
Delay/String3 
Delay/String4 
Delay/String5 
Delay_Settings 
Delay_DropFrame 
Delay_Source 
Delay_Source/String 
Delay_Original 
Delay_Original/String 
Delay_Original/String1 
Delay_Original/String2 
Delay_Original/String3 
Delay_Original/String4 
Delay_Original/String5 
Delay_Original_Settings 
Delay_Original_DropFrame 
Delay_Original_Source 
Video_Delay 
Video_Delay/String 
Video_Delay/String1 
Video_Delay/String2 
Video_Delay/String3 
Video_Delay/String4 
Video_Delay/String5 
Video0_Delay              : Deprecated
Video0_Delay/String       : Deprecated
Video0_Delay/String1      : Deprecated
Video0_Delay/String2      : Deprecated
Video0_Delay/String3      : Deprecated
Video0_Delay/String4      : Deprecated
Video0_Delay/String5      : Deprecated
TimeCode_FirstFrame 
TimeCode_LastFrame 
TimeCode_DropFrame 
TimeCode_Settings 
TimeCode_Source 
TimeCode_MaxFrameNumber 
TimeCode_MaxFrameNumber_Theory 
StreamSize 
StreamSize/String 
StreamSize/String1 
StreamSize/String2 
StreamSize/String3 
StreamSize/String4 
StreamSize/String5 
StreamSize_Proportion 
StreamSize_Demuxed 
StreamSize_Demuxed/String 
StreamSize_Demuxed/String1 
StreamSize_Demuxed/String2 
StreamSize_Demuxed/String3 
StreamSize_Demuxed/String4 
StreamSize_Demuxed/String5 
Source_StreamSize 
Source_StreamSize/String 
Source_StreamSize/String1 
Source_StreamSize/String2 
Source_StreamSize/String3 
Source_StreamSize/String4 
Source_StreamSize/String5 
Source_StreamSize_Proportion 
StreamSize_Encoded 
StreamSize_Encoded/String 
StreamSize_Encoded/String1 
StreamSize_Encoded/String2 
StreamSize_Encoded/String3 
StreamSize_Encoded/String4 
StreamSize_Encoded/String5 
StreamSize_Encoded_Proportion 
Source_StreamSize_Encoded 
Source_StreamSize_Encoded/String 
Source_StreamSize_Encoded/String1 
Source_StreamSize_Encoded/String2 
Source_StreamSize_Encoded/String3 
Source_StreamSize_Encoded/String4 
Source_StreamSize_Encoded/String5 
Source_StreamSize_Encoded_Proportion 
Title 
Encoded_Application 
Encoded_Application/String 
Encoded_Application_CompanyName 
Encoded_Application_Name 
Encoded_Application_Version 
Encoded_Application_Url 
Encoded_Library 
Encoded_Library/String 
Encoded_Library_CompanyName 
Encoded_Library_Name 
Encoded_Library_Version 
Encoded_Library_Date 
Encoded_Library_Settings 
Encoded_OperatingSystem 
Language 
Language/String 
Language/String1 
Language/String2 
Language/String3 
Language/String4 
Language_More 
ServiceKind 
ServiceKind/String 
Disabled 
Disabled/String 
Default 
Default/String 
Forced 
Forced/String 
AlternateGroup 
AlternateGroup/String 
Summary 
Encoded_Date 
Tagged_Date 
Encryption 
Events_Total 
Events_MinDuration 
Events_MinDuration/String 
Events_MinDuration/String1 
Events_MinDuration/String2 
Events_MinDuration/String3 
Events_MinDuration/String4 
Events_MinDuration/String5 
Events_PopOn 
Events_RollUp 
Events_PaintOn 
Lines_Count 
Lines_MaxCountPerEvent 
FirstDisplay_Delay_Frames 
FirstDisplay_Type 
Tyrrrz commented 1 year ago

Thank you for the insight. Interestingly, the language option is indeed passed to FFmpeg:

https://github.com/Tyrrrz/YoutubeExplode/blob/59d642788aa03967a791806b29a55e55a9b3c510/YoutubeExplode.Converter/Converter.cs#L136-L140

Unless the command line usage is wrong and the last -metadata... overwrites the former. I'll try to investigate.

BrendanxP commented 1 year ago

Thank you for the insight. Interestingly, the language option is indeed passed to FFmpeg:

https://github.com/Tyrrrz/YoutubeExplode/blob/59d642788aa03967a791806b29a55e55a9b3c510/YoutubeExplode.Converter/Converter.cs#L136-L140

Unless the command line usage is wrong and the last -metadata... overwrites the former. I'll try to investigate.

Thanks for your quick replies and help!

Using input from some forums online, ChatGPT is convinced the code should be changed to the following. As this will give each metadata entry a unique key making them not overwrite each other. Though, I am not very experienced with C# myself, so I could be mistaken here ;)

foreach (var (subtitleInput, i) in subtitleInputs.WithIndex())
{
    arguments
        .Add($"-metadata:s:s:{i}:language={subtitleInput.Info.Language.Code}")
        .Add($"-metadata:s:s:{i}:title={subtitleInput.Info.Language.Name}");
}

I am happy to help test using Plex or MediaInfo if you think you have a breakthrough.

Tyrrrz commented 1 year ago

I tried only leaving the language part (i.e. the first two lines from the snippet I linked) and it still doesn't show the language:

image

Tyrrrz commented 1 year ago

Okay I found the issue. The language is stored using a 2-letter code by YoutubeExplode, but it seems that MediaInfo only recognizes 3-letter codes. The problem is that YouTube only provides 2-letter codes in the metadata, so it's not exactly trivial to convert from one to another.

Is it customary to require 3-letter codes for subtitle language identifiers?

BrendanxP commented 1 year ago

I have no idea if it is customary to be honest. Maybe you could try to convert the 2 digit version to 3 digits using CultureInfo Hope this helps!

Tyrrrz commented 1 year ago

The issue is that CultureInfo is platform-dependent, so the same language information may not be available on certain OS/versions.

BrendanxP commented 1 year ago

The issue is that CultureInfo is platform-dependent, so the same language information may not be available on certain OS/versions.

I did not know that. Would it an idea to hardcode a list to map the values? It is obviously not pretty, but if works...

Using the following code I got the complete list of ISO languages from CultureInfo

using System;
using System.Globalization;

public class SamplesCultureInfo
{

   public static void Main()
   {

      // Displays several properties of the neutral cultures.
      Console.WriteLine("ISO2 ISO3");
      foreach (CultureInfo ci in CultureInfo.GetCultures(CultureTypes.NeutralCultures))
      {
          Console.Write("{\"");
          Console.Write(ci.TwoLetterISOLanguageName);
          Console.Write("\", \"");
          Console.Write(ci.ThreeLetterISOLanguageName);
          Console.WriteLine("\"},");
      }
   }
}

The list contains some duplicates for some reason, but when you take those out you get:

{"iv", "ivl"},
{"af", "afr"},
{"agq", "agq"},
{"ak", "aka"},
{"am", "amh"},
{"ar", "ara"},
{"as", "asm"},
{"asa", "asa"},
{"ast", "ast"},
{"az", "aze"},
{"az", "aze"},
{"az", "aze"},
{"bas", "bas"},
{"be", "bel"},
{"bem", "bem"},
{"bez", "bez"},
{"bg", "bul"},
{"bm", "bam"},
{"bn", "ben"},
{"bo", "bod"},
{"br", "bre"},
{"brx", "brx"},
{"bs", "bos"},
{"bs", "bos"},
{"bs", "bos"},
{"ca", "cat"},
{"ccp", "ccp"},
{"ce", "che"},
{"cgg", "cgg"},
{"chr", "chr"},
{"ckb", "ckb"},
{"cs", "ces"},
{"cy", "cym"},
{"da", "dan"},
{"dav", "dav"},
{"de", "deu"},
{"dje", "dje"},
{"dsb", "dsb"},
{"dua", "dua"},
{"dyo", "dyo"},
{"dz", "dzo"},
{"ebu", "ebu"},
{"ee", "ewe"},
{"el", "ell"},
{"en", "eng"},
{"eo", "epo"},
{"es", "spa"},
{"et", "est"},
{"eu", "eus"},
{"ewo", "ewo"},
{"fa", "fas"},
{"ff", "ful"},
{"fi", "fin"},
{"fil", "fil"},
{"fo", "fao"},
{"fr", "fra"},
{"fur", "fur"},
{"fy", "fry"},
{"ga", "gle"},
{"gd", "gla"},
{"gl", "glg"},
{"gsw", "gsw"},
{"gu", "guj"},
{"guz", "guz"},
{"gv", "glv"},
{"ha", "hau"},
{"haw", "haw"},
{"he", "heb"},
{"hi", "hin"},
{"hr", "hrv"},
{"hsb", "hsb"},
{"hu", "hun"},
{"hy", "hye"},
{"id", "ind"},
{"ig", "ibo"},
{"ii", "iii"},
{"is", "isl"},
{"it", "ita"},
{"ja", "jpn"},
{"jgo", "jgo"},
{"jmc", "jmc"},
{"ka", "kat"},
{"kab", "kab"},
{"kam", "kam"},
{"kde", "kde"},
{"kea", "kea"},
{"khq", "khq"},
{"ki", "kik"},
{"kk", "kaz"},
{"kkj", "kkj"},
{"kl", "kal"},
{"kln", "kln"},
{"km", "khm"},
{"kn", "kan"},
{"ko", "kor"},
{"kok", "kok"},
{"ks", "kas"},
{"ksb", "ksb"},
{"ksf", "ksf"},
{"ksh", "ksh"},
{"kw", "cor"},
{"ky", "kir"},
{"lag", "lag"},
{"lb", "ltz"},
{"lg", "lug"},
{"lkt", "lkt"},
{"ln", "lin"},
{"lo", "lao"},
{"lrc", "lrc"},
{"lt", "lit"},
{"lu", "lub"},
{"luo", "luo"},
{"luy", "luy"},
{"lv", "lav"},
{"mas", "mas"},
{"mer", "mer"},
{"mfe", "mfe"},
{"mg", "mlg"},
{"mgh", "mgh"},
{"mgo", "mgo"},
{"mk", "mkd"},
{"ml", "mal"},
{"mn", "mon"},
{"mr", "mar"},
{"ms", "msa"},
{"mt", "mlt"},
{"mua", "mua"},
{"my", "mya"},
{"mzn", "mzn"},
{"naq", "naq"},
{"nb", "nob"},
{"nd", "nde"},
{"nds", "nds"},
{"ne", "nep"},
{"nl", "nld"},
{"nmg", "nmg"},
{"nn", "nno"},
{"nnh", "nnh"},
{"nus", "nus"},
{"nyn", "nyn"},
{"om", "orm"},
{"or", "ori"},
{"os", "oss"},
{"pa", "pan"},
{"pa", "pan"},
{"pa", "pan"},
{"pl", "pol"},
{"ps", "pus"},
{"pt", "por"},
{"qu", "que"},
{"rm", "roh"},
{"rn", "run"},
{"ro", "ron"},
{"rof", "rof"},
{"ru", "rus"},
{"rw", "kin"},
{"rwk", "rwk"},
{"sah", "sah"},
{"saq", "saq"},
{"sbp", "sbp"},
{"se", "sme"},
{"seh", "seh"},
{"ses", "ses"},
{"sg", "sag"},
{"shi", "shi"},
{"shi", "shi"},
{"shi", "shi"},
{"si", "sin"},
{"sk", "slk"},
{"sl", "slv"},
{"smn", "smn"},
{"sn", "sna"},
{"so", "som"},
{"sq", "sqi"},
{"sr", "srp"},
{"sr", "srp"},
{"sr", "srp"},
{"sv", "swe"},
{"sw", "swa"},
{"ta", "tam"},
{"te", "tel"},
{"teo", "teo"},
{"tg", "tgk"},
{"th", "tha"},
{"ti", "tir"},
{"to", "ton"},
{"tr", "tur"},
{"tt", "tat"},
{"twq", "twq"},
{"tzm", "tzm"},
{"ug", "uig"},
{"uk", "ukr"},
{"ur", "urd"},
{"uz", "uzb"},
{"uz", "uzb"},
{"uz", "uzb"},
{"uz", "uzb"},
{"vai", "vai"},
{"vai", "vai"},
{"vai", "vai"},
{"vi", "vie"},
{"vun", "vun"},
{"wae", "wae"},
{"wo", "wol"},
{"xog", "xog"},
{"yav", "yav"},
{"yi", "yid"},
{"yo", "yor"},
{"yue", "yue"},
{"yue", "yue"},
{"yue", "yue"},
{"zgh", "zgh"},
{"zh", "zho"},
{"zh", "zho"},
{"zh", "zho"},
{"zu", "zul"},

Then you can make a function such as this to convert:

using System;
using System.Collections.Generic;

class Program
{
    static void Main()
    {
        string twoDigitCode = "en"; // Replace with the two-digit code you want to translate.
        string threeDigitCode = TranslateToThreeDigitCode(twoDigitCode);

        if (threeDigitCode != null)
        {
            Console.WriteLine($"Two-Digit: {twoDigitCode} -> Three-Digit: {threeDigitCode}");
        }
        else
        {
            Console.WriteLine("Translation not found for the provided two-digit code.");
        }
    }

    static string TranslateToThreeDigitCode(string twoDigitCode)
    {
        // Create a dictionary to map two-digit ISO codes to three-digit ISO codes.
        Dictionary<string, string> isoMappings = new Dictionary<string, string>
        {
            {"iv", "ivl"},
            {"af", "afr"},
            {"agq", "agq"},
            {"ak", "aka"},
            {"am", "amh"},
            {"ar", "ara"},
            {"as", "asm"},
            {"asa", "asa"},
            {"ast", "ast"},
            {"az", "aze"},
            {"bas", "bas"},
            {"be", "bel"},
            {"bem", "bem"},
            {"bez", "bez"},
            {"bg", "bul"},
            {"bm", "bam"},
            {"bn", "ben"},
            {"bo", "bod"},
            {"br", "bre"},
            {"brx", "brx"},
            {"bs", "bos"},
            {"ca", "cat"},
            {"ccp", "ccp"},
            {"ce", "che"},
            {"cgg", "cgg"},
            {"chr", "chr"},
            {"ckb", "ckb"},
            {"cs", "ces"},
            {"cy", "cym"},
            {"da", "dan"},
            {"dav", "dav"},
            {"de", "deu"},
            {"dje", "dje"},
            {"dsb", "dsb"},
            {"dua", "dua"},
            {"dyo", "dyo"},
            {"dz", "dzo"},
            {"ebu", "ebu"},
            {"ee", "ewe"},
            {"el", "ell"},
            {"en", "eng"},
            {"eo", "epo"},
            {"es", "spa"},
            {"et", "est"},
            {"eu", "eus"},
            {"ewo", "ewo"},
            {"fa", "fas"},
            {"ff", "ful"},
            {"fi", "fin"},
            {"fil", "fil"},
            {"fo", "fao"},
            {"fr", "fra"},
            {"fur", "fur"},
            {"fy", "fry"},
            {"ga", "gle"},
            {"gd", "gla"},
            {"gl", "glg"},
            {"gsw", "gsw"},
            {"gu", "guj"},
            {"guz", "guz"},
            {"gv", "glv"},
            {"ha", "hau"},
            {"haw", "haw"},
            {"he", "heb"},
            {"hi", "hin"},
            {"hr", "hrv"},
            {"hsb", "hsb"},
            {"hu", "hun"},
            {"hy", "hye"},
            {"id", "ind"},
            {"ig", "ibo"},
            {"ii", "iii"},
            {"is", "isl"},
            {"it", "ita"},
            {"ja", "jpn"},
            {"jgo", "jgo"},
            {"jmc", "jmc"},
            {"ka", "kat"},
            {"kab", "kab"},
            {"kam", "kam"},
            {"kde", "kde"},
            {"kea", "kea"},
            {"khq", "khq"},
            {"ki", "kik"},
            {"kk", "kaz"},
            {"kkj", "kkj"},
            {"kl", "kal"},
            {"kln", "kln"},
            {"km", "khm"},
            {"kn", "kan"},
            {"ko", "kor"},
            {"kok", "kok"},
            {"ks", "kas"},
            {"ksb", "ksb"},
            {"ksf", "ksf"},
            {"ksh", "ksh"},
            {"kw", "cor"},
            {"ky", "kir"},
            {"lag", "lag"},
            {"lb", "ltz"},
            {"lg", "lug"},
            {"lkt", "lkt"},
            {"ln", "lin"},
            {"lo", "lao"},
            {"lrc", "lrc"},
            {"lt", "lit"},
            {"lu", "lub"},
            {"luo", "luo"},
            {"luy", "luy"},
            {"lv", "lav"},
            {"mas", "mas"},
            {"mer", "mer"},
            {"mfe", "mfe"},
            {"mg", "mlg"},
            {"mgh", "mgh"},
            {"mgo", "mgo"},
            {"mk", "mkd"},
            {"ml", "mal"},
            {"mn", "mon"},
            {"mr", "mar"},
            {"ms", "msa"},
            {"mt", "mlt"},
            {"mua", "mua"},
            {"my", "mya"},
            {"mzn", "mzn"},
            {"naq", "naq"},
            {"nb", "nob"},
            {"nd", "nde"},
            {"nds", "nds"},
            {"ne", "nep"},
            {"nl", "nld"},
            {"nmg", "nmg"},
            {"nn", "nno"},
            {"nnh", "nnh"},
            {"nus", "nus"},
            {"nyn", "nyn"},
            {"om", "orm"},
            {"or", "ori"},
            {"os", "oss"},
            {"pa", "pan"},
            {"pl", "pol"},
            {"ps", "pus"},
            {"pt", "por"},
            {"qu", "que"},
            {"rm", "roh"},
            {"rn", "run"},
            {"ro", "ron"},
            {"rof", "rof"},
            {"ru", "rus"},
            {"rw", "kin"},
            {"rwk", "rwk"},
            {"sah", "sah"},
            {"saq", "saq"},
            {"sbp", "sbp"},
            {"se", "sme"},
            {"seh", "seh"},
            {"ses", "ses"},
            {"sg", "sag"},
            {"shi", "shi"},
            {"si", "sin"},
            {"sk", "slk"},
            {"sl", "slv"},
            {"smn", "smn"},
            {"sn", "sna"},
            {"so", "som"},
            {"sq", "sqi"},
            {"sr", "srp"},
            {"sv", "swe"},
            {"sw", "swa"},
            {"ta", "tam"},
            {"te", "tel"},
            {"teo", "teo"},
            {"tg", "tgk"},
            {"th", "tha"},
            {"ti", "tir"},
            {"to", "ton"},
            {"tr", "tur"},
            {"tt", "tat"},
            {"twq", "twq"},
            {"tzm", "tzm"},
            {"ug", "uig"},
            {"uk", "ukr"},
            {"ur", "urd"},
            {"uz", "uzb"},
            {"vai", "vai"},
            {"vi", "vie"},
            {"vun", "vun"},
            {"wae", "wae"},
            {"wo", "wol"},
            {"xog", "xog"},
            {"yav", "yav"},
            {"yi", "yid"},
            {"yo", "yor"},
            {"yue", "yue"},
            {"zgh", "zgh"},
            {"zh", "zho"},
            {"zu", "zul"}
        };

        if (isoMappings.ContainsKey(twoDigitCode))
        {
            return isoMappings[twoDigitCode];
        }
        else
        {
            return null; // Translation not found for the provided two-digit code.
        }
    }
}

I hope this can work.

Tyrrrz commented 1 year ago

Yeah that should work. I still want to understand whether the language attribute needs to be 3-letter-coded, or is it just a MediaInfo quirk?

BrendanxP commented 1 year ago

Yeah that should work. I still want to understand whether the language attribute needs to be 3-letter-coded, or is it just a MediaInfo quirk?

They also use 3 letter style in this ffmpeg documentation. Furthermore, maybe this GitHub page about GPAC might be relevant.

Wikipedia has a nice overview and links to sources about the codes.

Tyrrrz commented 1 year ago

Furthermore, maybe this GitHub page about GPAC might be relevant.

They say en-US should be recognized, but it doesn't work either 🤔

BrendanxP commented 1 year ago

Furthermore, maybe this GitHub page about GPAC might be relevant.

They say en-US should be recognized, but it doesn't work either 🤔

Hmmm. That is confusing indeed. Though, they do mention that MP4 itself has always used the 3 digit standard. Whereas the GPAC framework might be able to understand and implement more input types. At least, thats how I interpret it after a second read and taking into account your findings.

Tyrrrz commented 1 year ago

Should work now

image

BrendanxP commented 1 year ago

Thanks for your help and the great tool! @Tyrrrz