SRGSSR / pillarbox-documentation

Technical cross-platform documentation for Pillarbox
https://www.pillarbox.ch/
MIT License
3 stars 0 forks source link

Understand and document subtitle language encoding #1

Closed defagos closed 2 years ago

defagos commented 2 years ago

From Patrick (stream team):

We have a quick question regarding Apple / Safari and Frech special character encodings. Maybe you, as an expert in both, can help us :slightly_smiling_face:

When trying to do subtitles with the Français title, Apple/Safari wierdly url encodes this, when Azure puts it in the manifest URL. Example here: https://amtins.github.io/cassettator-forbidden-adventures/?url=https%3A%2F%2Fpremium-[…]8%2Fsubtitlesfrancais.ism%2Fmanifest%28format%3Dm3u8-aapl%29

Chrome does a correct %C3%A7: https://premium-srgssrliveinteuwe-euwe.streaming.media.azure.net/5e379d1c-475d-4152-97c4-8bcd76665768/sub[…]Manifest(Fran%C3%A7ais,format=m3u8-aapl)

Safari somehow makes a ç out of the ç and then url-encodes this to %C3%83%C2%A7which then fails to load the resource: https://premium-srgssrliveinteuwe-euwe.streaming.media.azure.net/5e379d1c-475d-4152-97c4-8bcd76665768/subtitlesfrancais.ism/QualityLevels(9600)/Manifest(FranÃ

defagos commented 2 years ago

This is a bug of Azure stream packaging. According to the HLS RFC 8216 a playlist MUST be encoded in UTF-8, but this is obviously not the case here:

#EXTM3U
#EXT-X-VERSION:4
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="audio",NAME="audio1",LANGUAGE="fra",DEFAULT=YES,AUTOSELECT=YES,URI="QualityLevels(128000)/Manifest(audio1,format=m3u8-aapl)"
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="Français",LANGUAGE="fra",DEFAULT=YES,AUTOSELECT=YES,CHARACTERISTICS="public.accessibility.transcribes-spoken-dialog",URI="QualityLevels(9600)/Manifest(Français,format=m3u8-aapl)"
#EXT-X-STREAM-INF:BANDWIDTH=402227,RESOLUTION=320x180,CODECS="avc1.42e020,mp4a.40.2",AUDIO="audio",SUBTITLES="subs"
QualityLevels(249600)/Manifest(video,format=m3u8-aapl)
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=402227,RESOLUTION=320x180,CODECS="avc1.42e020",URI="QualityLevels(249600)/Manifest(video,format=m3u8-aapl,type=keyframes)"
#EXT-X-STREAM-INF:BANDWIDTH=657318,RESOLUTION=480x272,CODECS="avc1.4d4020,mp4a.40.2",AUDIO="audio",SUBTITLES="subs"
QualityLevels(499200)/Manifest(video,format=m3u8-aapl)
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=657318,RESOLUTION=480x272,CODECS="avc1.4d4020",URI="QualityLevels(499200)/Manifest(video,format=m3u8-aapl,type=keyframes)"
#EXT-X-STREAM-INF:BANDWIDTH=1373536,RESOLUTION=640x360,CODECS="avc1.4d4020,mp4a.40.2",AUDIO="audio",SUBTITLES="subs"
QualityLevels(1200000)/Manifest(video,format=m3u8-aapl)
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=1373536,RESOLUTION=640x360,CODECS="avc1.4d4020",URI="QualityLevels(1200000)/Manifest(video,format=m3u8-aapl,type=keyframes)"
#EXT-X-STREAM-INF:BANDWIDTH=2190727,RESOLUTION=960x544,CODECS="avc1.4d4029,mp4a.40.2",AUDIO="audio",SUBTITLES="subs"
QualityLevels(1999600)/Manifest(video,format=m3u8-aapl)
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=2190727,RESOLUTION=960x544,CODECS="avc1.4d4029",URI="QualityLevels(1999600)/Manifest(video,format=m3u8-aapl,type=keyframes)"
#EXT-X-STREAM-INF:BANDWIDTH=138976,CODECS="mp4a.40.2",AUDIO="audio",SUBTITLES="subs"
QualityLevels(128000)/Manifest(audio1,format=m3u8-aapl)

The cedille incorrectly appears in the playlist as ç, not as its UTF-8 representation %C3%A7. Apple players, which likely better implement their own streaming specification and are thus more picky than Google Chrome, simply assume the playlist content they get has been properly UTF-8 encoded and read two byte characters C3 and A7, respectively à and § in the extended ASCII table.

To fix the issue Azure should ensure playlists are always properly UTF-8 encoded. Apple players should then behave fine and I guess Chrome should still behave correctly as well.

defagos commented 2 years ago

No need for a documentation update here, everything is in the RFC. I'll just close this issue.