SubtitleEdit / subtitleedit

the subtitle editor :)
http://www.nikse.dk/SubtitleEdit/Help
GNU General Public License v3.0
7.64k stars 857 forks source link

[BUG] Language parsing in idx/sub: old DVD language codes #1675

Closed aaaxx closed 8 years ago

aaaxx commented 8 years ago

It seems that DVD specification uses some deprecated version of ISO 639-1 for setting language codes, which messes up the language selection dropdown box in SE's Import Vobsub dialog. What happens is the track with the unknown code takes the Language name of the following track, which ripples down the rest of the list pulling all other track names out of sync, and the final track then gets marked as "Unknown language". Here's a sample file with Hebrew (iw) being the offender.

I compared the list of language codes used by the DVD spec (see p. 186 here or p. 525 here) with the current ISO 639-1 and these are the differences:

DVD code ISO 639-1 Language name Change notice
in id Indonesian changed 1989-03-11
iw he Hebrew changed 1989-03-11
ji yi Yiddish changed 1989-03-11
jw jv Javanese “jw” published in error; withdrawn in favor of “jv”, 2001-08-13
mo ro Romanian “mo” for Moldavian has been withdrawn, recommending “ro” also for Moldavian, 2008-11-03
sh bs Bosnian Serbo-Croatian was deprecated in 2000 in favor of separate codes for each individual language (Serbian, Croatian, and then Bosnian was added), 2000-02-18
[since Serbian and Croatian were already present in the old DVD code list it’s probably best to map it to Bosnian]

And here's the full list of DVD codes in case someone needs it for future reference:

Language name Code Hex Decimal Native name
Abkhaz ab 6162 24930 аҧсуа бызшәа, аҧсшәа
Afar aa 6161 24929 Afaraf
Afrikaans af 6166 24934 Afrikaans
Albanian sq 7371 29553 Shqip
Amharic am 616D 24941 አማርኛ
Arabic ar 6172 24946 العربية
Armenian hy 6879 26745 Հայերեն
Assamese as 6173 24947 অসমীয়া
Aymara ay 6179 24953 aymar aru
Azerbaijani az 617A 24954 azərbaycan dili
Bashkir ba 6261 25185 башҡорт теле
Basque eu 6575 25973 euskara, euskera
Belarusian be 6265 25189 беларуская мова
Bengali, Bangla bn 626E 25198 বাংলা
Bihari bh 6268 25192 भोजपुरी
Bislama bi 6269 25193 Bislama
Breton br 6272 25202 brezhoneg
Bulgarian bg 6267 25191 български език
Burmese my 6D79 28025 ဗမာစာ
Catalan ca 6361 25441 català
Chinese zh 7A68 31336 中文 (Zhōngwén), 汉语, 漢語
Corsican co 636F 25455 corsu, lingua corsa
Croatian hr 6872 26738 hrvatski jezik
Czech cs 6373 25459 čeština, český jazyk
Danish da 6461 25697 dansk
Dutch nl 6E6C 28268 Nederlands, Vlaams
Dzongkha dz 647A 25722 རྫོང་ཁ
English en 656E 25966 English
Esperanto eo 656F 25967 Esperanto
Estonian et 6574 25972 eesti, eesti keel
Faroese fo 666F 26223 føroyskt
Fijian fj 666A 26218 vosa Vakaviti
Finnish fi 6669 26217 suomi, suomen kieli
French fr 6672 26226 français, langue française
Galician gl 676C 26476 galego
Georgian ka 6B61 27489 ქართული
German de 6465 25701 Deutsch
Greek el 656C 25964 ελληνικά
Guaraní gn 676E 26478 Avañe'ẽ
Gujarati gu 6775 26485 ગુજરાતી
Hausa ha 6861 26721 (Hausa) هَوُسَ
Hebrew iw 6977 26999 עברית
Hindi hi 6869 26729 हिन्दी, हिंदी
Hungarian hu 6875 26741 magyar
Icelandic is 6973 26995 Íslenska
Indonesian in 696E 26990 Bahasa Indonesia
Interlingua ia 6961 26977 Interlingua
Interlingue ie 6965 26981 Originally called Occidental; then Interlingue after WWII
Inupiaq ik 696B 26987 Iñupiaq, Iñupiatun
Irish ga 6761 26465 Gaeilge
Italian it 6974 26996 italiano
Japanese ja 6A61 27233 日本語 (にほんご)
Javanese jw 6A77 27255 basa Jawa
Kalaallisut, Greenlandic kl 6B6C 27500 kalaallisut, kalaallit oqaasii
Kannada kn 6B6E 27502 ಕನ್ನಡ
Kashmiri ks 6B73 27507 कश्मीरी, كشميري‎
Kazakh kk 6B6B 27499 қазақ тілі
Khmer km 6B6D 27501 ខ្មែរ, ខេមរភាសា, ភាសាខ្មែរ
Kinyarwanda rw 7277 29303 Ikinyarwanda
Kirundi rn 726E 29294 Ikirundi
Korean ko 6B6F 27503 한국어, 조선어
Kurdish ku 6B75 27509 Kurdî, كوردی‎
Kyrgyz ky 6B79 27513 Кыргызча, Кыргыз тили
Lao lo 6C6F 27759 ພາສາລາວ
Latin la 6C61 27745 latine, lingua latina
Latvian lv 6C76 27766 latviešu valoda
Lingala ln 6C6E 27758 Lingála
Lithuanian lt 6C74 27764 lietuvių kalba
Macedonian mk 6D6B 28011 македонски јазик
Malagasy mg 6D67 28007 fiteny malagasy
Malay ms 6D73 28019 bahasa Melayu, بهاس ملايو‎
Malayalam ml 6D6C 28012 മലയാളം
Maltese mt 6D74 28020 Malti
Māori mi 6D69 28009 te reo Māori
Marathi (Marāṭhī) mr 6D72 28018 मराठी
Moldavian, Moldovan mo 6D6F 28015 limba moldovenească, лимба молдовеняскэ
Mongolian mn 6D6E 28014 Монгол хэл
Nauruan na 6E61 28257 Dorerin Naoero
Nepali ne 6E65 28261 नेपाली
Norwegian no 6E6F 28271 Norsk
Occitan oc 6F63 28515 occitan, lenga d'òc
Oriya or 6F72 28530 ଓଡ଼ିଆ
Oromo om 6F6D 28525 Afaan Oromoo
Panjabi, Punjabi pa 7061 28769 ਪੰਜਾਬੀ, پنجابی‎
Pashto, Pushto ps 7073 28787 پښتو
Persian (Farsi) fa 6661 26209 فارسی
Polish pl 706C 28780 język polski, polszczyzna
Portuguese pt 7074 28788 português
Quechua qu 7175 29045 Runa Simi, Kichwa
Romanian ro 726F 29295 limba română
Romansh rm 726D 29293 rumantsch grischun
Russian ru 7275 29301 Русский
Samoan sm 736D 29549 gagana fa'a Samoa
Sango sg 7367 29543 yângâ tî sängö
Sanskrit (Saṁskṛta) sa 7361 29537 संस्कृतम्
Scottish Gaelic, Gaelic gd 6764 26468 Gàidhlig
Serbian sr 7372 29554 српски језик
Serbo-Croatian sh 7368 29544 srpskohrvatski jezik
Shona sn 736E 29550 chiShona
Sindhi sd 7364 29540 सिन्धी, سنڌي، سندھی‎
Sinhala, Sinhalese si 7369 29545 සිංහල
Slovak sk 736B 29547 slovenčina, slovenský jazyk
Slovene sl 736C 29548 slovenski jezik, slovenščina
Somali so 736F 29551 Soomaaliga, af Soomaali
Southern Sotho st 7374 29556 Sesotho
Spanish es 6573 25971 español
Sundanese su 7375 29557 Basa Sunda
Swahili sw 7377 29559 Kiswahili
Swati ss 7373 29555 SiSwati
Swedish sv 7376 29558 svenska
Tagalog tl 746C 29804 Wikang Tagalog, ᜏᜒᜃᜅ᜔ ᜆᜄᜎᜓᜄ᜔
Tajik tg 7467 29799 тоҷикӣ, toçikī, تاجیکی‎
Tamil ta 7461 29793 தமிழ்
Tatar tt 7474 29812 татар теле, tatar tele
Telugu te 7465 29797 తెలుగు
Thai th 7468 29800 ไทย
Tibetan Standard, Tibetan, Central bo 626F 25199 བོད་ཡིག
Tigrinya ti 7469 29801 ትግርኛ
Tonga (Tonga Islands) to 746F 29807 faka Tonga
Tsonga ts 7473 29811 Xitsonga
Tswana tn 746E 29806 Setswana
Turkish tr 7472 29810 Türkçe
Turkmen tk 746B 29803 Türkmen, Түркмен
Twi tw 7477 29815 Twi
Ukrainian uk 756B 30059 Українська
Urdu ur 7572 30066 اردو
Uzbek uz 757A 30074 Oʻzbek, Ўзбек, أۇزبېك‎
Vietnamese vi 7669 30313 Tiếng Việt
Volapük vo 766F 30319 Volapük
Welsh cy 6379 25465 Cymraeg
Western Frisian fy 6679 26233 Frysk
Wolof wo 776F 30575 Wollof
Xhosa xh 7868 30824 isiXhosa
Yiddish ji 6A69 27241 ייִדיש
Yoruba yo 796F 31087 Yorùbá
Zulu zu 7A75 31349 isiZulu
xylographe commented 8 years ago

Does SubtitleEdit-3.4.12.79-issue1675.7z fix this?

aaaxx commented 8 years ago

Yes, it does. Thanks. :smile: