MediaArea / MediaInfoLib

Convenient unified display of the most relevant technical and tag data for video and audio files.
https://mediaarea.net/MediaInfo
BSD 2-Clause "Simplified" License
630 stars 171 forks source link

File_Mpeg_Descriptors::Get_DVB_Text does not support all possible encodings used in DVB #1669

Open lighterowl opened 1 year ago

lighterowl commented 1 year ago

The implementation of File_Mpeg_Descriptors::Get_DVB_Text, which is the central point for converting a "DVB string" representation to the internal Ztring, only supports ISO-8859-2 and calls Get_Local (which ends up using CP_ACP on Windows and ISO-8859-1 on other systems) for all other combinations of the bytes used for representing the used encoding.

Furthermore, the function processes the buffer with Get_Local if the first byte is larger or equal to 0x20, indicating that the "default encoding" should be used. Get_Local, as already noted, uses either CP_ACP or ISO-8859-1. This also incorrect, as the "default encoding" for DVB strings is IEC 6937 with the euro sign (0x20AC) instead of $ at position 0xA4.

The current mapping is described in DVB BlueBook A038r15 :

Screenshot_2023-02-17_21-42-34 Screenshot_2023-02-17_21-42-49

JeromeMartinez commented 1 year ago

It was clearly a quick implementation, without the support of everything. Currently not a priority for us but could be prioritized on request. Would you mind to share some sample files demonstrating this issue with MediaInfo?

lighterowl commented 1 year ago

Sure, here you go (gzipped so github will accept) : tvp_rozrywka.ts.gz

When running this file with MediaInfo, the service information is incorrect w.r.t. some characters :

Menu
ID                                       : 501 (0x1F5)
Menu ID                                  : 62 (0x3E)
Format                                   : HEVC / E-AC-3 / DVB Subtitle / E-AC-3 / 
Duration                                 : 15 s 344 ms
List                                     : 502 (0x1F6) (HEVC) / 503 (0x1F7) (E-AC-3, Polish) / 506 (0x1FA) (DVB Subtitle) / 508 (0x1FC) (E-AC-3, aux) / 8005 (0x1F45) ()
Language                                 :  / Polish /  / aux
Service name                             : TVP Rozrywka
Service provider                         : Emitel
Service type                             : reserved for future use
UTC 2023-02-22 21:10:00                  : pl:Wojciech Cejrowski- boso przez úwiat - (68) Wenezuela - Boso ale w ostrogach / pl: / foreign countries/expeditions /  / 00:35:00 / 
UTC 2023-02-22 21:45:00                  : pl:Rolnik szuka ýony seria 9 - /9/ / pl: / social/spiritual sciences /  / 01:00:00 / 
UTC 2023-02-22 22:45:00                  : pl:Szansa na sukces. Opole 2023 - odc. (8) Piotr Cugowski / pl: / music/ballet/dance /  / 01:10:00 / 
UTC 2023-02-22 23:55:00                  : pl:Koùo fortuny - odc. 1441 ed. 12 / pl: / game show/quiz/contest /  / 00:40:00 / 
UTC 2023-02-26 03:05:00                  : pl:Ýycie to Kabaret - Kabaretomaniacy - (1) / pl: / variety show /  / 00:50:00 / 
UTC 2023-02-26 03:55:00                  : pl:Zakoñczenie dnia / pl: / undefined /  / 01:40:00 / 
UTC 2023-02-26 05:35:00                  : pl:Okrasa ùamie przepisy - Lekko i dietetycznie z królikiem / pl: / cooking /  / 00:35:00 / 

For example, the last event, Okrasa ùamie przepisy, should be Okrasa łamie przepisy. The descriptor for this particular event starts at offset 0x6E1069 into the file :

$ xxd -s 0x6E1069 -l 10 tvp_rozrywka.ts
006e1069: 4d3e 706f 6c39 094f 6b72                 M>pol9.Okr

The bytes are, in order :