老板，想问下你关于 AAC 的相关格式参考？

JimmyVV commented 7 years ago

你的 flv-demuxer.js 里面对 AAC 做了相关的解析，但是有一部分涉及到了浏览器这一块。具体代码为：

 if (userAgent.indexOf('firefox') !== -1) {
            // firefox: use SBR (HE-AAC) if freq less than 24kHz
            if (samplingIndex >= 6) {
                audioObjectType = 5;
                config = new Array(4);
                extensionSamplingIndex = samplingIndex - 3;
            } else {  // use LC-AAC
                audioObjectType = 2;
                config = new Array(2);
                extensionSamplingIndex = samplingIndex;
            }
        } else if (userAgent.indexOf('android') !== -1) {
            // android: always use LC-AAC
            audioObjectType = 2;
            config = new Array(2);
            extensionSamplingIndex = samplingIndex;
        } else {
            // for other browsers, e.g. chrome...
            // Always use HE-AAC to make it easier to switch aac codec profile
            audioObjectType = 5;
            extensionSamplingIndex = samplingIndex;
            config = new Array(4);

            if (samplingIndex >= 6) {
                extensionSamplingIndex = samplingIndex - 3;
            } else if (channelConfig === 1) {  // Mono channel
                audioObjectType = 2;
                config = new Array(2);
                extensionSamplingIndex = samplingIndex;
            }
        }

想问一下，您这个的关于 HE-ACC 和 LC-ACC 的相关处理是参考那个文档的。我查阅了一下 ISO/IEC part3 和 part4 发现里面没有说明 LC-AAC, HE-AAC 里面关于 extensionSamplingIndex 的处理。

xqq commented 7 years ago

是针对浏览器的workaround，参考hls.js

JimmyVV commented 7 years ago

xqq 大神，还是有几个问题想请教你一下。特别是关于 MP4 remux 这一块。我有点不理解相关的概念。能不能邮箱交流一下。我是腾讯 Now 直播的前端，关于这一块很有兴趣。望不舍赐教。

xqq commented 7 years ago

welcome

JimmyVV commented 7 years ago

想问一下，您在计算 MP4 这一块的时候，dtsCorrection 是怎么计算出来的呢？没有看懂这里。你具体代码有：

if (dtsCorrection == undefined) {
                if (this._videoNextDts == undefined) {
                    if (this._videoSegmentInfoList.isEmpty()) {
                        dtsCorrection = 0;
                    } else {
                        let lastSample = this._videoSegmentInfoList.getLastSampleBefore(originalDts);
                        if (lastSample != null) {
                            let distance = (originalDts - (lastSample.originalDts + lastSample.duration));
                            if (distance <= 3) {
                                distance = 0;
                            }
                            let expectedDts = lastSample.dts + lastSample.duration + distance;
                            dtsCorrection = originalDts - expectedDts;
                        } else {  // lastSample == null
                            dtsCorrection = 0;
                        }
                    }
                } else {
                    dtsCorrection = originalDts - this._videoNextDts;
                }
            }

xqq commented 7 years ago

本质上，是要把这批 packets 中第一个 packet 的时间戳，修正到紧跟上一批 packets 末尾的位置算出差值，后面所有 packet 时间戳做相应平移

如果 videoNextDts 为 undefined (因一些原因被reset)，则从 segments 信息表中搜索时间轴上前一个 segment 的信息，根据其最后一个 packet 决定

JimmyVV commented 7 years ago

非常感谢您的回答~

那如果仅仅只是针对于固有文件解码而言，则就不需要了吧？

这么做有业务场景吗？是针对直播，还是其他场景当中呢？

xqq commented 7 years ago

手工 remux、喂给 MSE 的场景，必须得这样修时间戳

因为 MSE 接受的 fmp4，每个 segment 最后一个 packet 的 sampleDuration 往往是复制前一个 packet 的 sampleDuration，会造成 +-1~2ms 的误差。而 MSE 严格要求连续的 segments 之间不能有时间戳间隙

Shan-Ye commented 7 years ago

dtsCorrection是为了保证让浏览器认为数据是连续性的，dts是表顺序性的，而不是表时间性的概念。

JimmyVV commented 7 years ago

谦谦，还有一个问题想请教你一下。在 trun box 里面，里面有个字段 data_offset。它的计算值有什么含义吗？我对比了一下你和 hls.js 里面的相关 box 的计算：

let trun = MP4.trun(track, sdtp.byteLength + 16 + 16 + 8 + 16 + 8 + 8);

// in MP4.trun function
 offset += 8 + dataSize;
 data.set([
            0x00, 0x00, 0x0F, 0x01,      // version(0) & flags
            (sampleCount >>> 24) & 0xFF, // sample_count
            (sampleCount >>> 16) & 0xFF,
            (sampleCount >>>  8) & 0xFF,
            (sampleCount) & 0xFF,
            (offset >>> 24) & 0xFF,      // data_offset
            (offset >>> 16) & 0xFF,
            (offset >>>  8) & 0xFF,
            (offset) & 0xFF
        ], 0);

但在 HLS.js 中，则结果为：

MP4.trun(track,
                    sampleDependencyTable.length +
                    16 + // tfhd
                    20 + // tfdt
                    8 +  // traf header
                    16 + // mfhd
                    8 +  // moof header
                    8),  // mdat header

// trun function
offset += 8 + arraylen;
array.set([
      0x00, // version 0
      0x00, 0x0f, 0x01, // flags
      (len >>> 24) & 0xFF,
      (len >>> 16) & 0xFF,
      (len >>> 8) & 0xFF,
      len & 0xFF, // sample_count
      (offset >>> 24) & 0xFF,
      (offset >>> 16) & 0xFF,
      (offset >>> 8) & 0xFF,
      offset & 0xFF // data_offset
    ],0);

MP4 给出的官方解释是:

data_offset is added to the implicit or explicit data_offset established in the track fragment header.

有点没理解这句话和上面代码之间的联系

xqq commented 7 years ago

https://github.com/video-dev/hls.js/commit/a4d91f1563799787472f4edd1ffc8233a7deda4c#diff-5370391fc0f63f30e8498af38d9c443fR552

xqq commented 7 years ago

最终写入进去的 offset，是该 sample 的 raw data 在该分片[文件]中，距离文件起始处的偏移

一个分片即 moof+mdat

JimmyVV commented 7 years ago

谢谢，谦谦~ 还有一个问题是关于 MP4.stsd ISO 文档关于 stsd 的解释是：

aligned(8) class SampleDescriptionBox (unsigned int(32) handler_type) extends FullBox('stsd', 0, 0){
int i ;
unsigned int(32) entry_count;
for (i = 1 ; i u entry_count ; i++){
      switch (handler_type){
         case ‘soun’: // for audio tracks
            AudioSampleEntry();
            break;
         case ‘vide’: // for video tracks
            VisualSampleEntry();
            break;
         case ‘hint’: // Hint track
            HintSampleEntry();
break; }
} }

假设这里设 VisualSampleEntry 视频格式。并且为 AVC，此时 MP4 格式规范是:

class VisualSampleEntry(codingname) extends SampleEntry (codingname){ unsigned int(16) pre_defined = 0;
const unsigned int(16) reserved = 0;
unsigned int(32)[3] pre_defined = 0;
unsigned int(16) width;
unsigned int(16) height;
template unsigned int(32) horizresolution = 0x00480000; // 72 dpi template unsigned int(32) vertresolution = 0x00480000; // 72 dpi const unsigned int(32) reserved = 0;
template unsigned int(16) frame_count = 1;
string[32] compressorname;
template unsigned int(16) depth = 0x0018;
int(16) pre_defined = -1;
}

但在实际编码中，你在提供上面信息之后又额外提供了关于 avcC 的信息。请问，后面这一段有参考吗？

MP4.box(MP4.types.avc1, data, MP4.box(MP4.types.avcC, avcc));

而在针对 mp4a 音频里也存在这样额外的信息：

MP4.box(MP4.types.mp4a, data, MP4.esds(meta));

MP4 给出的答案解释是：

All such extensions shall be within boxes; these boxes occur after the required fields. Unrecognized boxes shall be ignored.

我找了一下，发现没找到这些拓展 box 的格式说明

xqq commented 7 years ago

可能在其他ISO文档里，其它几个part包含H264和AAC的specification。官网是付费的。目前手上没有

JimmyVV commented 7 years ago

我在自测微型库的时候，遇到一个 error ，返现 chrome://media-internals/ 提供了报错的信息

00:00:00 258    debug   ISO BMFF boxes that run to EOS are not supported
00:00:00 258    error   Append: stream parsing failed. Data size=131072 append_window_start=0 append_window_end=inf

ISO 的 boxes 不能用于 EOS? 我是直接测试的是 videoTrack，没有带上音频。最终提供的是一段只包括 video 的 Buffer.

     let moof_mdat = this._remuxVideo();

    // first of all, only remux video to test it could be played normally
    this._initSegment = MP4.initBox(this._videoMeta);

    return this._playSeg(this._initSegment, moof_mdat);

差不多就是直接将 moof,mdat, moov, ftyp 等盒子一次性喂给 MSE。

是不是有可能是盒子编码错了？

xqq commented 7 years ago

https://chromium.googlesource.com/chromium/src/media/+/master/formats/mp4/box_reader.cc#271

JimmyVV commented 7 years ago

上面代码我也搜到过，但是，里面有些参数有点不太清楚。box_size 是 chromomium 直接从 Box Buffer 的 Header 部分直接读出来的吗？

CHECK(Read4Into8(&box_size) && ReadFourCC(&type_));

也就是说，我里面很有可能存在 empty box

if (box_size == 0) {
    if (is_EOS_) {
      // All the data bytes are expected to be provided.
      box_size = base::checked_cast<uint64_t>(buf_size_);
    } else {
      MEDIA_LOG(DEBUG, media_log_)
          << "ISO BMFF boxes that run to EOS are not supported";
      *err = true;
      return false;
    }
  } else if (box_size == 1) {
    if (!HasBytes(8)) {
      // If EOS is known, then this is an error. If not, it's a soft error.
      *err = is_EOS_;
      return false;
    }
    CHECK(Read8(&box_size));
  }

JimmyVV commented 7 years ago

谦谦，我在解决 audioRemux 的时候，遇到 silent frame 的概念，你在生成这个 silent frame 的时候，使用的方法和 HLS.js 里面的有点区别。根据它的注释，你是不是只支持了 mp4a.40.2 版本的 silent frame。

case 'mp4a.40.2':
        if (channelCount === 1) {
          return new Uint8Array([0x00, 0xc8, 0x00, 0x80, 0x23, 0x80]);
        } else if (channelCount === 2) {
          return new Uint8Array([0x21, 0x00, 0x49, 0x90, 0x02, 0x19, 0x00, 0x23, 0x80]);
        } else if (channelCount === 3) {
          return new Uint8Array([0x00, 0xc8, 0x00, 0x80, 0x20, 0x84, 0x01, 0x26, 0x40, 0x08, 0x64, 0x00, 0x8e]);
        } else if (channelCount === 4) {
          return new Uint8Array([0x00, 0xc8, 0x00, 0x80, 0x20, 0x84, 0x01, 0x26, 0x40, 0x08, 0x64, 0x00, 0x80, 0x2c, 0x80, 0x08, 0x02, 0x38]);
        } else if (channelCount === 5) {
          return new Uint8Array([0x00, 0xc8, 0x00, 0x80, 0x20, 0x84, 0x01, 0x26, 0x40, 0x08, 0x64, 0x00, 0x82, 0x30, 0x04, 0x99, 0x00, 0x21, 0x90, 0x02, 0x38]);
        } else if (channelCount === 6) {
          return new Uint8Array([0x00, 0xc8, 0x00, 0x80, 0x20, 0x84, 0x01, 0x26, 0x40, 0x08, 0x64, 0x00, 0x82, 0x30, 0x04, 0x99, 0x00, 0x21, 0x90, 0x02, 0x00, 0xb2, 0x00, 0x20, 0x08, 0xe0]);
        }
        break;
    // handle HE-AAC below (mp4a.40.5 / mp4a.40.29)
      default:
        if (channelCount === 1) {
          return new Uint8Array([0x1..]);
        } else if (channelCount === 2) {
          return new Uint8Array([...]);
        } else if (channelCount === 3) {
          return new Uint8Array([0x1,0..]);
        }
        break;
    }

silent frame 在什么情况下会出现呢？

xqq commented 7 years ago

LC-AAC 的 silent frame 混在 HE-AAC 的 track 里解码没有问题。我并不记得 hls.js 用 silent frame 做了什么，这边是为了解决 Edge seek 后卡住的 bug:

// Workaround for IE11/Edge: Fill silent aac frame after keyframe-seeking
// Make audio beginDts equals with video beginDts, in order to fix seek freeze

当发生Seek时，定为到某个关键帧开始处理，该位置开始的第一个 audio packet 的时间戳往往在该关键帧之后。而 Edge 的 buffered.start() 会以时间轴上最靠左的位置(视频)汇报，当将 video.currentTime 指过去时，又因为该时间点缺失音频帧而卡住。

故将从 IDR 帧到第一个 audio packet 之间的间隔填补出空音频帧。

JimmyVV commented 7 years ago

XQQ 想问一下，如果是想直接通过 websocket 协议来获得 FLV audio tag 来进行转码和播放。在最后的 Buffer 处理需要注意什么吗？比如，在处理：

 this._parseAVCData({
                    buffer,
                    dataOffset: dataOffset + 4,
                    dataSize: dataSize - 4,
                    timeStamp,
                    tagPosition,
                    frameType,
                    cts
                });

直接通过 appendBuffer 添加转码过后的 mp4 文件，还需要 timeStamp 字段吗？

xqq commented 7 years ago

不懂你在说什么，flv的话当然需要timestamp

JimmyVV commented 7 years ago

xqq，您在通过 ws 协议来做直播时，后续每一段 Buffer 你是怎么处理的呢？比如开头的 Buffer 包含 MP4 文件开头，比如 ftyp，那么后续的实际数据流，还需要封装成完成的 MP4 文件再添加给 sourceBuffer 吗？谢谢大佬~

JimmyVV commented 7 years ago

大神，有一个问题想请教一下，如果你是下载一个很大的 flv，怎么实现边下载边播放呢？下面有两个想法，你是选择的哪一个呢？为什么呢？

只生成一个 MP4 文件，后续的 segment 直接拼接在该 mp4 文件后面。
生成多个 MP4 文件，然后，通过拼接不同的 MP4 文件进行播放

xqq commented 7 years ago

显然是 2，并且 flv.js 从一开始就是流式设计 segment mp4 没有手工拼接这一说啊，生成片段 append 给 MSE 就完事了

JimmyVV commented 7 years ago

大神，有个问题想请教请教： MSE 操作的整个过程是如下的吗？

拼接生成第一段 mp4 buffer
append MSE，并只需要调用一次 video.play() 方法（后续还需要调用吗？）
后续生成其它段 mp4 buffer。

这里有一个关键问题，那后续的 MP4 Buffer 还需要 timeStamp 这个属性吗？（因为考虑到 mp4 文件是独立的话，timeStamp 是不是也没有存在的必要了呢？)

xqq commented 7 years ago

Initialization Segment Media Segment 先复习 fmp4 spec，这玩意和传统 mp4 文件完全是两码事……

JimmyVV commented 7 years ago

xqq，有个问题想请教一下。如果把 video 和 audio 放在一个 mp4 文件中。在生成 mdat 的时候，两个 buffer 必须交织放置吗？如果交织，需要保证两边的 chunk 数量一致，按照 audio/video/audio... 这样的排列顺序吗？

xqq commented 7 years ago

原则上应该是按时间戳顺序排 ffmpeg av_interleaved_write_frame

JimmyVV commented 7 years ago

现在，我并没有按照 mix 来进行编码，直接获得的mp4 文件可以正常播放，但是，会在 media-internal 里面报出这个错误：

Failed to reconcile encoded audio times with decoded output.

这个错误。我查了一下内核编码，发现给出的提示是：

// Let developers know if their files timestamps are way off from
      if (num_unstable_audio_tries_ > limit_unstable_audio_tries_) {
        MEDIA_LOG(ERROR, media_log_)
            << "Failed to reconcile encoded audio times with decoded output.";
      }

是因为我没有通过 mix 编码来吗？还是说，整个文件的 duration 要以 video 或 audio 中最长的 duration 为基准呢？

xqq commented 7 years ago

mdat里每个sample和moof里的offset表完全对的上就行

JimmyVV commented 7 years ago

然后浏览器给出了这个错误，我有点不理解。xqq 你有遇见过这种错误吗?

audio_buffering_state	BUFFERING_HAVE_ENOUGH
audio_channels_count	2
audio_codec_name	aac
audio_dds	false
audio_decoder	FFmpegAudioDecoder
audio_sample_format	Float 32-bit planar
audio_samples_per_second	48000
bitrate	883501
debug	FFmpegDemuxer: av_read_frame(): End of file
duration	2.822
error	Failed to reconcile encoded audio times with decoded output.
event	PAUSE
found_audio_stream	true
found_video_stream	true

JimmyVV commented 7 years ago

xqq 还有一个问题想请教一下，浏览器只支持 fragmented MP4 吗？不支持普通的 unfragmented MP4?

xqq commented 7 years ago

MSE yes

JimmyVV commented 7 years ago

xqq，你有没有遇到过，加载第一段 video Buffer 的时候，视频并没有播放完。比如，第一段 video.duration 为 2s，但是到 0.9s 的时候就卡住。

此时，会触发 video 的 waiting 事件。文档上解释是，缺少 buffer 数据。

后面，通过检查 timeRange 发现其范围确实是 0~2s。但是，media-internal 里面又没有报错。

求大腿。

xqq commented 7 years ago

你检查的是video的buffered? 还是video sourcebuffer的buffered?

JimmyVV commented 7 years ago

video 的 buffered。两个我都已经 check 过了。

xqq commented 7 years ago

有音频吗？音频check过？

JimmyVV commented 7 years ago

音频是正常的。但是视频中间会卡主

JimmyVV commented 7 years ago

xqq，你在使用 flv.js 做测试的时候，播放第一段 video 没有在 Chrome 中遇到过这样的问题吗？它会在最后 1s 左右卡住。但是音频是正常的。

基本的 box 有：

[ftyp] size=8+16
[moov] size=8+609
  [mvhd] size=12+96
  [trak] size=8+453
    [tkhd] size=12+80, flags=7
    [mdia] size=8+353
      [mdhd] size=12+20
      [hdlr] size=12+33
      [minf] size=8+268
        [vmhd] size=12+8, flags=1
        [dinf] size=8+28
          [dref] size=12+16
            [url ] size=12+0, flags=1
              location = [local to file]
        [stbl] size=8+204
          [stsd] size=12+124
            [avc1] size=8+112
              [avcC] size=8+26
          [stts] size=12+4
          [stsc] size=12+4
          [stsz] size=12+8
          [stco] size=12+4
  [mvex] size=8+32
    [trex] size=12+20
[moof] size=8+1108
  [mfhd] size=12+4
  [traf] size=8+1084
    [tfhd] size=12+4
    [tfdt] size=12+4
    [trun] size=12+968, flags=f01
    [sdtp] size=8+64
[mdat] size=8+540859

不知道和正确的 box 是否一致呢？

xqq commented 7 years ago

没遇到过

JimmyVV commented 7 years ago

具体想问一下 fragmented mp4 tag(moof + mdat) ，里面对帧有什么要求吗？比如，每一个 fragmented video mp4(moof + mdat) 来说，第一帧都必须是 I 帧。这个是一定的吗？

xqq commented 7 years ago

没有这个要求

JimmyVV commented 7 years ago

那每个 mdat 里面装的帧数量有限制吗？假如，流是一帧一帧的传，那 media segment 里面也可以只加一帧吗？

xqq commented 7 years ago

完全可以，注意填对帧的sample duration

JimmyVV commented 7 years ago

但是，我后面按照 append IS 的方式，还是会有点问题。因为，这是直播流，duration 无法指定，也就是每段 moof + mdat 的 duration 可以获取。但是无法写到 ftyp + moov 头中，这个时候，就会出现 duration unknown 的错误，导致无法播放。那这样的话，就只能每次重新 remux 一个完整的 fmp4 进行添加吗？

xqq commented 7 years ago

直播流总duration本身就未知啊，填0没毛病，为什么会无法播放而且 flv.js 根本就没有这些问题，直接读或者照搬 implementation 不好么

JimmyVV commented 7 years ago

但是，我这个流并不是常规流，不能直接调用 FLV.js，只能拆分。

我现在在疑惑，那整个流程是：

第一次：生成 ftyp + moov，append 给 MSE
后面流程：生成 moof + mdat, append 给 MSE

最后只要生成一遍 ftyp + moov，后续直接就是 moof + mdat 无限循环了吗？

xqq commented 7 years ago

对

JimmyVV commented 7 years ago

xqq，你在播放音频的时候有遇见这个问题吗？

我单纯播放音频的时候，会遇到这个问题：

Failed to reconcile encoded audio times with decoded output.

里面提示信息是：

Audio buffer splice at PTS=5775000us. Trimmed tail of overlapped buffer (PTS=5774000us) by 19000us.

chromium 里面源码里面，说是:


// Reconciling encoded buffer timestamps with decoded output often requires
 // adjusting expectations by some offset.

为什么一定需要的 offset 呢？

xqq commented 7 years ago

需要保证每个fmp4 segment的baseMediaDecodeTime和上个segment的结束时间严格吻合

mwb-27 commented 6 years ago

@JimmyVV 你好，我看到你上面提的一个问题，播放视频时，在播放时间距离buffered.end()还有1s多的时候就触发了waiting事件。我也碰到同样的问题。请教下后来有解决或者找到原因吗，还是chrome的video本身机制就是如此？

bilibili / flv.js

老板，想问下你关于 AAC 的相关格式参考？ #131