HbbTV-Association / ReferenceApplication

MIT License
80 stars 33 forks source link

2 character vs 3 character language codes #74

Closed jpiesing closed 1 year ago

jpiesing commented 1 year ago

@lang according to DASH both 4th and 5th edition "Declares the language code for this Adaptation Set. The syntax and semantics according to IETF RFC 5646 shall be used. If not present, the language code may be defined for each media component or it may be unknown. If the language is unknown, the 'und' code for undetermined primary language or the 'zxx' (Non-Linguistic, Not Applicable) code can be used." based on RFC 5646, on page 10 of RFC 5646 you will read that 2 character codes will be used

here you can find the iana registrations for language tags: iana.org/assignments/language-subtag-registry/language-subtag-registry ger/eng etc are not registered language tags even finish is fi not fin

also we have some examples on our page: https://demo.unified-streaming.com/k8s/features/stable/#!/mpd for example under the subtitle section,

also the language is not carried in the MP4 mdhd and this is recommended according to 14496-30

Please review language codes and check that the correct choice of 2-character codes vs 3-character codes has been made.

Murmur commented 1 year ago

We use 3-letter language codes in a manifest and init.mp4 files. https://refapp.hbbtv.org/videos/00_llama_h264_v8/manifest_1080p.mpd init.mp4 files MOOV/TRAK/MDIA/MDHD.lang: video=eng, audio1=eng, audio2=fin, audio3=ger manifest AdaptationSet@lang: video=eng, audio1=eng, audio2=fin, audio3=ger

Do specs say must use 2-letter codes in the init.mp4(MDHD box) file and manifest.mpd file?

ps: It's possible some of the very old test content(livesim content) does not have a mdhd.lang but are to be replaced once Livesim2 transition is ready.

jpiesing commented 1 year ago

We use 3-letter language codes in a manifest and init.mp4 files. https://refapp.hbbtv.org/videos/00_llama_h264_v8/manifest_1080p.mpd init.mp4 files MOOV/TRAK/MDIA/MDHD.lang: video=eng, audio1=eng, audio2=fin, audio3=ger manifest AdaptationSet@lang: video=eng, audio1=eng, audio2=fin, audio3=ger

Do specs say must use 2-letter codes in the init.mp4(MDHD box) file and manifest.mpd file?

Yes. The internet standard is to use 2-letter codes where they are defined and only use 3-letter codes where there is no 2-letter code. From RFC5646 ...

"When languages have both an ISO 639-1 two-character code and a three-character code (assigned by ISO 639-2, ISO 639-3, or ISO 639-5), only the ISO 639-1 two-character code is defined in the IANA registry."

Murmur commented 1 year ago

This is the summary of language specifications. We need to fix manifest@lang values and mhdh.lang mp4 atom value for german language.

ISO-639-1 : "de"  2-letter lang. Use this in dash manifest.mpd @lang attribute.
ISO-639-2T: "deu" 3-letter from native name. Use this in ffmpeg+mp4box cmdline.
ISO-639-2B: "ger" 3-letter from english name. Do not use unless a custom specific use-case.
14496-12  : 2-byte value in mp4 atom "MDHD.lang" field, bitshift+charoffset from 0x15C7->"eng".

lang(dec)=5575, lang(hex)=0x15C7, lang(ISO-639-2T)=eng, use "en" in mpd for english
lang(dec)=20197,lang(hex)=0x4EE5, lang(ISO-639-2T)=swe, use "sv" in mpd for swedish
lang(dec)=6446, lang(hex)=0x192E, lang(ISO-639-2T)=fin, use "fi" in mpd for finnish
lang(dec)=4277, lang(hex)=0x10B5, lang(ISO-639-2T)=deu, use "de" in mpd for german

Do not use in mp4 mdhd atom, ffmpeg+mp4box cmdline does not validate values:
lang(dec)=7346, lang(hex)=0x1CB2, lang(ISO-639-2B)=ger
Murmur commented 1 year ago

New language tags in mpd@lang, mp4@mdhd.lang fields. New content is not yet added to a production+staging UI menus.

NoDrm: https://refapp.hbbtv.org/videos/00_llama_h264_v9/manifest_subib.mpd Playready: https://refapp.hbbtv.org/videos/00_llama_h264_v9/cenc/manifest_prcenc_subib.mpd Laurl: https://test.playready.microsoft.com/service/rightsmanager.asmx?cfg=(kid:header,sl:2000,persist:false,contentkey:EjQSNBI0EjQSNBI0EjQSNg==)

Murmur commented 1 year ago

Enabled a @lang fix content on all tests (NoDrm, Playready, Widevine, Clearkey, Marlin, Live, MultiPeriod, MultiDRM, h264, h265). https://refapp.hbbtv.org/production/catalogue/ https://refapp.hbbtv.org/staging/catalogue/

Example for german(ger) language ISO-639-1 : "de" 2-letter lang in manifest @lang ISO-639-2T: "deu" 3-letter from native name in ffmpeg+mp4box cmdline 14496-12 : 3-letter from native name compressed as a 2-byte value in init.mp4 MDHD.lang