abema / go-mp4

Go library for reading and writing MP4 file
https://dev.to/sunfishshogi/go-mp4-golang-library-and-cli-tool-for-mp4-52o1
MIT License
464 stars 30 forks source link

improvement: Add support for keys and numbered ilst items BoxTypes. #159

Closed dtrejod closed 10 months ago

dtrejod commented 11 months ago

@sunfish-shogi

Before this PR

There was no support for the keys box type (reference: https://github.com/abema/go-mp4/issues/13). Additionally there was no support for numbered items under the ilst box type.

Before PR changes As a demonstration, I've updated the `testdata/sample_qt.mp4` file to have these new BoxTypes (atoms). The before mp4tool dump of this updated `testdata/sample_qt.mp4` file is shown below. From the below output you can see lines `[54-56]` show the `keys` and `ilst` blocks as "unsupported" ``` $ mp4tool dump testdata/sample_qt.mp4 | cat -n 1 [ftyp] Size=20 MajorBrand="qt " MinorVersion=512 CompatibleBrands=[{CompatibleBrand="qt "}] 2 [free] Size=42 Data=[...] (use "-full free" to show all) 3 [ftyp] Size=20 MajorBrand="qt " MinorVersion=512 CompatibleBrands=[{CompatibleBrand="qt "}] 4 [free] Size=42 Data=[...] (use "-full free" to show all) 5 [moov] Size=340357 6 [mvhd] Size=108 ... (use "-full mvhd" to show all) 7 [trak] Size=115889 8 [tkhd] Size=92 ... (use "-full tkhd" to show all) 9 [mdia] Size=115789 10 [mdhd] Size=32 Version=0 Flags=0x000000 CreationTimeV0=2082844800 ModificationTimeV0=2082844800 Timescale=24 DurationV0=14315 Language="und" PreDefined=0 11 [hdlr] Size=45 Version=0 Flags=0x000000 PreDefined=1835560050 HandlerType="vide" Name="VideoHandler" 12 [minf] Size=115704 13 [vmhd] Size=20 Version=0 Flags=0x000001 Graphicsmode=0 Opcolor=[0, 0, 0] 14 [dinf] Size=36 15 [dref] Size=28 Version=0 Flags=0x000000 EntryCount=1 16 [url ] Size=12 Version=0 Flags=0x000001 17 [stbl] Size=115596 18 [stsd] Size=148 Version=0 Flags=0x000000 EntryCount=1 19 [avc1] Size=132 DataReferenceIndex=1 PreDefined=0 PreDefined2=[1179012432, 512, 512] Width=424 Height=240 Horizresolution=4718592 Vertresolution=4718592 FrameCount=1 Compressorname="libx264" Depth=24 PreDefined3=-1 20 [avcC] Size=46 ... (use "-full avcC" to show all) 21 [stts] Size=24 Version=0 Flags=0x000000 EntryCount=1 Entries=[{SampleCount=14315 SampleDelta=1}] 22 [stss] Size=832 ... (use "-full stss" to show all) 23 [stsc] Size=28 Version=0 Flags=0x000000 EntryCount=1 Entries=[{FirstChunk=1 SamplesPerChunk=1 SampleDescriptionIndex=1}] 24 [stsz] Size=57280 ... (use "-full stsz" to show all) 25 [stco] Size=57276 ... (use "-full stco" to show all) 26 [hdlr] Size=44 Version=0 Flags=0x000000 PreDefined=1684565106 HandlerType="url " Name="DataHandler" 27 [trak] Size=224196 28 [tkhd] Size=92 ... (use "-full tkhd" to show all) 29 [mdia] Size=224096 30 [mdhd] Size=32 Version=0 Flags=0x000000 CreationTimeV0=2082844800 ModificationTimeV0=2082844800 Timescale=48000 DurationV0=28628992 Language="und" PreDefined=0 31 [hdlr] Size=45 Version=0 Flags=0x000000 PreDefined=1835560050 HandlerType="soun" Name="SoundHandler" 32 [minf] Size=224011 33 [smhd] Size=16 Version=0 Flags=0x000000 Balance=0 34 [dinf] Size=36 35 [dref] Size=28 Version=0 Flags=0x000000 EntryCount=1 36 [url ] Size=12 Version=0 Flags=0x000001 37 [stbl] Size=223907 38 [stsd] Size=147 Version=0 Flags=0x000000 EntryCount=1 39 [mp4a] Size=131 DataReferenceIndex=1 EntryVersion=1 ChannelCount=2 SampleSize=16 PreDefined=65534 SampleRate=48000 QuickTimeData=[0x0, 0x0, 0x4, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x2] 40 [wave] Size=79 41 [frma] Size=12 DataFormat="mp4a" 42 [mp4a] Size=12 QuickTimeData=[0x0, 0x0, 0x0, 0x0] 43 [esds] Size=39 ... (use "-full esds" to show all) 44 [0x00000000] (unsupported box type) Size=8 Data=[...] (use "-full 0x00000000" to show all) 45 [stts] Size=24 Version=0 Flags=0x000000 EntryCount=1 Entries=[{SampleCount=27958 SampleDelta=1024}] 46 [stsc] Size=28 Version=0 Flags=0x000000 EntryCount=1 Entries=[{FirstChunk=1 SamplesPerChunk=1 SampleDescriptionIndex=1}] 47 [stsz] Size=111852 ... (use "-full stsz" to show all) 48 [stco] Size=111848 ... (use "-full stco" to show all) 49 [hdlr] Size=44 Version=0 Flags=0x000000 PreDefined=1684565106 HandlerType="url " Name="DataHandler" 50 [udta] Size=156 51 [(c)enc] (unsupported box type) Size=23 Data=[...] (use "-full (c)enc" to show all) 52 [meta] Size=125 Version=0 Flags=0x000000 53 [hdlr] Size=33 Version=0 Flags=0x000000 PreDefined=0 HandlerType="mdta" Name="" 54 [keys] (unsupported box type) Size=43 Data=[...] (use "-full keys" to show all) 55 [ilst] Size=37 56 [0x00000001] (unsupported box type) Size=29 Data=[...] (use "-full 0x00000001" to show all) ```

After this PR

Adds support for both keys and numbered items under the list box type.

After PR changes From the below output you can now see lines `[54-56]` show the `keys` and `list` blocks as properly handled now. ``` $ ./mp4tool dump testdata/sample_qt.mp4 | cat -n 1 [ftyp] Size=20 MajorBrand="qt " MinorVersion=512 CompatibleBrands=[{CompatibleBrand="qt "}] 2 [free] Size=42 Data=[...] (use "-full free" to show all) 3 [ftyp] Size=20 MajorBrand="qt " MinorVersion=512 CompatibleBrands=[{CompatibleBrand="qt "}] 4 [free] Size=42 Data=[...] (use "-full free" to show all) 5 [moov] Size=340357 6 [mvhd] Size=108 ... (use "-full mvhd" to show all) 7 [trak] Size=115889 8 [tkhd] Size=92 ... (use "-full tkhd" to show all) 9 [mdia] Size=115789 10 [mdhd] Size=32 Version=0 Flags=0x000000 CreationTimeV0=2082844800 ModificationTimeV0=2082844800 Timescale=24 DurationV0=14315 Language="und" PreDefined=0 11 [hdlr] Size=45 Version=0 Flags=0x000000 PreDefined=1835560050 HandlerType="vide" Name="VideoHandler" 12 [minf] Size=115704 13 [vmhd] Size=20 Version=0 Flags=0x000001 Graphicsmode=0 Opcolor=[0, 0, 0] 14 [dinf] Size=36 15 [dref] Size=28 Version=0 Flags=0x000000 EntryCount=1 16 [url ] Size=12 Version=0 Flags=0x000001 17 [stbl] Size=115596 18 [stsd] Size=148 Version=0 Flags=0x000000 EntryCount=1 19 [avc1] Size=132 DataReferenceIndex=1 PreDefined=0 PreDefined2=[1179012432, 512, 512] Width=424 Height=240 Horizresolution=4718592 Vertresolution=4718592 FrameCount=1 Compressorname="libx264" Depth=24 PreDefined3=-1 20 [avcC] Size=46 ... (use "-full avcC" to show all) 21 [stts] Size=24 Version=0 Flags=0x000000 EntryCount=1 Entries=[{SampleCount=14315 SampleDelta=1}] 22 [stss] Size=832 ... (use "-full stss" to show all) 23 [stsc] Size=28 Version=0 Flags=0x000000 EntryCount=1 Entries=[{FirstChunk=1 SamplesPerChunk=1 SampleDescriptionIndex=1}] 24 [stsz] Size=57280 ... (use "-full stsz" to show all) 25 [stco] Size=57276 ... (use "-full stco" to show all) 26 [hdlr] Size=44 Version=0 Flags=0x000000 PreDefined=1684565106 HandlerType="url " Name="DataHandler" 27 [trak] Size=224196 28 [tkhd] Size=92 ... (use "-full tkhd" to show all) 29 [mdia] Size=224096 30 [mdhd] Size=32 Version=0 Flags=0x000000 CreationTimeV0=2082844800 ModificationTimeV0=2082844800 Timescale=48000 DurationV0=28628992 Language="und" PreDefined=0 31 [hdlr] Size=45 Version=0 Flags=0x000000 PreDefined=1835560050 HandlerType="soun" Name="SoundHandler" 32 [minf] Size=224011 33 [smhd] Size=16 Version=0 Flags=0x000000 Balance=0 34 [dinf] Size=36 35 [dref] Size=28 Version=0 Flags=0x000000 EntryCount=1 36 [url ] Size=12 Version=0 Flags=0x000001 37 [stbl] Size=223907 38 [stsd] Size=147 Version=0 Flags=0x000000 EntryCount=1 39 [mp4a] Size=131 DataReferenceIndex=1 EntryVersion=1 ChannelCount=2 SampleSize=16 PreDefined=65534 SampleRate=48000 QuickTimeData=[0x0, 0x0, 0x4, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x2] 40 [wave] Size=79 41 [frma] Size=12 DataFormat="mp4a" 42 [mp4a] Size=12 QuickTimeData=[0x0, 0x0, 0x0, 0x0] 43 [esds] Size=39 ... (use "-full esds" to show all) 44 [0x00000000] (unsupported box type) Size=8 Data=[...] (use "-full 0x00000000" to show all) 45 [stts] Size=24 Version=0 Flags=0x000000 EntryCount=1 Entries=[{SampleCount=27958 SampleDelta=1024}] 46 [stsc] Size=28 Version=0 Flags=0x000000 EntryCount=1 Entries=[{FirstChunk=1 SamplesPerChunk=1 SampleDescriptionIndex=1}] 47 [stsz] Size=111852 ... (use "-full stsz" to show all) 48 [stco] Size=111848 ... (use "-full stco" to show all) 49 [hdlr] Size=44 Version=0 Flags=0x000000 PreDefined=1684565106 HandlerType="url " Name="DataHandler" 50 [udta] Size=156 51 [(c)enc] (unsupported box type) Size=23 Data=[...] (use "-full (c)enc" to show all) 52 [meta] Size=125 Version=0 Flags=0x000000 53 [hdlr] Size=33 Version=0 Flags=0x000000 PreDefined=0 HandlerType="mdta" Name="" 54 [keys] Size=43 Version=0 Flags=0x000000 EntryCount=1 Entries=[{KeySize=27 KeyNamespace="mdta" KeyValue="com.android.version"}] 55 [ilst] Size=37 56 [0x00000001] Size=29 Version=0 Flags=0x000000 ItemName="data" Data={DataType=UTF8 DataLang=0 Data="1.0.0"} ```
sunfish-shogi commented 10 months ago

@dtrejod Thank you for Pull Request.

I have a suggestion.

The value of Item.Type will not be larger than Keys.EntryCount, so we can judge whether the value is valid completely by additional logic instead of using 1024 box type entries.

AddBoxDef(Ex) function and AddAnyTypeBoxDef(Ex) function build boxMap and getBoxDef function of mp4.go finds an entry from boxMap and returns it. However its map-based resolver is not suitable for this use-case.

We can implement to resolve Apple metadata box by following code instead:

func (boxType BoxType) getBoxDef(ctx Context) *boxDef {
  boxDefs := boxMap[boxType]
  for i := len(boxDefs) - 1; i >= 0; i-- {
    boxDef := &boxDefs[i]
    if boxDef.isTarget == nil || boxDef.isTarget(ctx) {
      return boxDef
    }   
  }

  if ctx.UnderIlst {
    typeID := /* TODO: convert boxType to uint32 */
    if typeID >= 1 && typeID <= ctx.QuickTimeKeysMetaEntryCount {
      return &boxDef {
        /* TODO */
      }
    }
  }

  return nil 
}

For this approach, we need to add QuickTimeKeysMetaEntryCount field to Context, and pass true via its field to brother atoms of keys atom.

reference:

Android libstagefright https://android.googlesource.com/platform/frameworks/av/+/e7142a0703bc93f75e213e96ebc19000022afed9/media/libstagefright/MPEG4Extractor.cpp#2329

status_t MPEG4Extractor::parseQTMetaVal(
  int32_t keyId, off64_t offset, size_t size) {
  ssize_t index = mMetaKeyMap.indexOfKey(keyId);
  if (index < 0) {
    // corresponding key is not present, ignore
    return ERROR_MALFORMED;
  }
dtrejod commented 10 months ago

@sunfish-shogi Thank you for the feedback. Agree that approach is much more sensible. I pushed a commit with your suggestion.

UPDATE: After some further testing I identified the approach here does not work however. The latest commit demonstrates the number items under ilst are not properly handled because the keys box is not a parent of the ilst box. Since there isn't a nested relationship, the Context isn't preserved across box types.

I'll revisit this PR when I have time and a better understanding of this repository.

sunfish-shogi commented 10 months ago

@dtrejod

The latest commit demonstrates the number items under ilst are not properly handled because the keys box is not a parent of the ilst box. Since there isn't a nested relationship, the Context isn't preserved across box types.

For example, read.go detects ftyp box and sets IsQuickTimeCompatible flag.

https://github.com/abema/go-mp4/blob/v1.1.1/read.go#L53-L65

And it propagate the flag to following same level boxes.

https://github.com/abema/go-mp4/blob/v1.1.1/read.go#L172-L174

It is important to implement in read.go instead of specific handlers ((k *Keys) OnReadField handler). Because users can skip to read fields of keys box by Seek-function, so it is not necessary to call handlers.

dtrejod commented 10 months ago

Thanks for the pointers pointing me in the correct direction.