flavioribeiro / nginx-audio-track-for-hls-module

:sound: Nginx module that generates audio track for HTTP Live Streaming (HLS) streams on the fly.
GNU General Public License v3.0
136 stars 17 forks source link

ID3 Tag on the packet #22

Open flavioribeiro opened 10 years ago

flavioribeiro commented 10 years ago

as per HLS spec, Each Elementary Audio Stream segment MUST signal the timestamp of its first sample with an ID3 PRIV tag at the beginning of the segment. The ID3 PRIV owner identifier MUST be "com.apple.streaming.transportStreamTimestamp".

https://ffmpeg.org/doxygen/trunk/structplaylist.html#ae271f3bc6020caa11e50ae29f9a10966 https://ffmpeg.org/doxygen/trunk/hls_8c_source.html#l01476

jbochi commented 10 years ago

I did some tests with my friend @bernardocamilo.

We used Apple's simple stream for our tests:

$ curl -s https://devimages.apple.com.edgekey.net/streaming/examples/bipbop_4x3/gear0/prog_index.m3u8 | head
#EXTM3U
#EXT-X-TARGETDURATION:11
#EXT-X-VERSION:3
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-PLAYLIST-TYPE:VOD
#EXTINF:9.98458,
fileSequence0.aac
#EXTINF:9.98458,
fileSequence1.aac
#EXTINF:9.98459,

We can see the ID3 tag is present on the first audio segment:

$ curl -s https://devimages.apple.com.edgekey.net/streaming/examples/bipbop_4x3/gear0/fileSequence0.aac | hexdump -C | head
00000000  49 44 33 04 00 00 00 00  00 3f 50 52 49 56 00 00  |ID3......?PRIV..|
00000010  00 35 00 00 63 6f 6d 2e  61 70 70 6c 65 2e 73 74  |.5..com.apple.st|
00000020  72 65 61 6d 69 6e 67 2e  74 72 61 6e 73 70 6f 72  |reaming.transpor|
00000030  74 53 74 72 65 61 6d 54  69 6d 65 73 74 61 6d 70  |tStreamTimestamp|
00000040  00 00 00 00 00 00 0d 99  f4 ff f1 5c 80 01 bf fc  |...........\....|
00000050  21 20 03 40 68 1c ff f1  5c 80 16 1f fc 21 4e e7  |! .@h...\....!N.|
00000060  3f 07 e2 ed 72 05 28 32  82 40 ff e2 3f 3c b5 fc  |?...r.(2.@..?<..|
00000070  fe 2e 69 fd f9 ef fa 7c  ff ff be 2f 44 71 2f 40  |..i....|.../Dq/@|
00000080  2e 24 08 05 c4 81 00 b8  90 20 17 12 04 02 6c 6d  |.$....... ....lm|
00000090  8b dd 5e bb f9 2f e2 c1  b3 ef e0 e9 25 a5 12 55  |..^../......%..U|

According to the specification, the header is the first 10 bytes: 49 44 33 04 00 00 00 00 00 3f

49 44 33        -> ID3v2/file identifier    "ID3"
04 00           -> ID3v2 version            $04 00
00              -> ID3v2 flags              %abcd0000
00 00 00 3f     -> ID3v2 size               int 63

The next 63 bytes are the ID3v2 frame:

00000000                                 50 52 49 56 00 00  |          PRIV..|
00000010  00 35 00 00 63 6f 6d 2e  61 70 70 6c 65 2e 73 74  |.5..com.apple.st|
00000020  72 65 61 6d 69 6e 67 2e  74 72 61 6e 73 70 6f 72  |reaming.transpor|
00000030  74 53 74 72 65 61 6d 54  69 6d 65 73 74 61 6d 70  |tStreamTimestamp|
00000040  00 00 00 00 00 09 a7 c3  c0                       |.........       |

Frame's header is:

50 52 49 56     -> Frame ID      $xx xx xx xx  (four characters)    "PRIV"
00 00 00 35     -> Size      4 * %0xxxxxxx                          int 53
00 00           -> Flags         $xx xx

According to the frames specification, the PRIV frame format is:

     <Header for 'Private frame', ID: "PRIV">
     Owner identifier      <text string> $00
     The private data      <binary data>

So, the owner is com.apple.streaming.transportStreamTimestamp and the data 00 00 00 00 00 0d 99 f4.

This is exactly what panto's documentation says:

Elementary Audio Stream segment MUST signal the timestamp of its
first sample with an ID3 PRIV tag [ID3] at the beginning of the
segment.  The ID3 PRIV owner identifier MUST be
"com.apple.streaming.transportStreamTimestamp".  The ID3 payload MUST
be a 33-bit MPEG-2 Program Elementary Stream timestamp expressed as a
big-endian eight-octet number, with the upper 31 bits set to zero.

The timestamp of the other files are:

fileSequence0.aac: 00 0d 99 f4        -> 891380
fileSequence1.aac: 00 1b 50 28        -> 1789992
fileSequence2.aac: 00 29 06 5c        -> 2688604
fileSequence3.aac: 00 36 bc 91        -> 3587217

The MPEG2 timestamp unit is 1/90000 second. The delta between file 1 and 0 is 898612 or 9.98458s (898612/90000), the exact duration of the first segment.

The ID3 tag is not present on ts files, so this module should generate it. That shouldn't be that difficult, since only the timestamp will vary and it should be possible to get the timestamp from the video file.

flavioribeiro commented 10 years ago

Hi @jbochi and @bernardocamilo, thanks for the effort in making this analysis. The packets being manipulated are AVPacket structures.

jbochi commented 9 years ago

Hi @flavioribeiro. I finally implemented it in lua: https://github.com/jbochi/lua_jit_extract_audio/commit/565bd164c6123f3bd9384877a3af07e9d7ef245d

It should be easy to backport it to C now :)

flavioribeiro commented 9 years ago

great! :+1: thank you @jbochi