VoidXH / Cavern

Object-based audio engine and codec pack with Dolby Atmos rendering, room correction, HRTF, one-click Unity audio takeover, and much more.
http://cavern.sbence.hu
Other
323 stars 14 forks source link

Respect encoder delay in Matroska #93

Closed VoidXH closed 1 year ago

VoidXH commented 1 year ago

MKV audio tracks have a CodecDelay field, which might contain the optional E-AC-3 priming that's rarely present in content. This will fix audio sync in these cases.

VoidXH commented 1 year ago

@ValZapod, may I ask for your help here? For Edge of Tomorrow, FFmpeg actually has extra silence (about 4500 samples of it) prepending the audio. Where is this coming from? image

VoidXH commented 1 year ago

The entire track is delayed by about 3 AC-3 frames. This is an E-AC-3 + JOC track. Making a copy to an E-AC-3 track only removed the 1536 samples of priming, but 3000 samples of silence of unknown origin still remains.

VoidXH commented 1 year ago

Yes, EAE.EXE USES 768 samples as priming.

I was talking about 3 frames (4608 samples), not just 3 blocks (768 samples). That's not relevant here. When Matroska has no additional codec delay, the default priming of 1536 samples is used. That is also the case here, as the 4608 samples of delay was reduced to 3072 after a track copy, but this silence is still present.

There may be an editlist (or equivalent for mkv) for video too.

Why is it present in the audio after extracting it? This silence even had to be encoded into the track, does FFmpeg do this if you export the audio of a track with video delay, with -c copy?

Removed? You mean added back into decoded sound.

No, removed. There was 3 frames of delay, now it's only 2.

VoidXH commented 1 year ago

It really looks like a video delay, that track is way longer, thanks.

  Stream #0:0: Video: hevc (Main 10), yuv420p10le(tv, bt2020nc/bt2020/smpte2084), 3840x1600 [SAR 1:1 DAR 12:5], 23.98 fps, 23.98 tbr, 1k tbn (default)
    Metadata:
      BPS             : 8018868
      DURATION        : 01:53:27.801000000
      NUMBER_OF_FRAMES: 163224
      NUMBER_OF_BYTES : 6823857865
      _STATISTICS_WRITING_APP: mkvmerge v70.0.0 ('Caught A Lite Sneeze') 64-bit
      _STATISTICS_WRITING_DATE_UTC: 2022-09-05 20:42:02
      _STATISTICS_TAGS: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
  Stream #0:1(eng): Audio: eac3, 48000 Hz, 5.1(side), fltp, 576 kb/s (default)
    Metadata:
      title           : English DDP Atmos 5.1
      BPS             : 576000
      DURATION        : 01:53:26.720000000
      NUMBER_OF_FRAMES: 212710
      NUMBER_OF_BYTES : 490083840
      _STATISTICS_WRITING_APP: mkvmerge v70.0.0 ('Caught A Lite Sneeze') 64-bit
      _STATISTICS_WRITING_DATE_UTC: 2022-09-05 20:42:02
      _STATISTICS_TAGS: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES

EAC3 frame is 256 samples.

That's a block. Dolby and FFmpeg both call the 6-block sized superblock a frame. This is the same in the specification.

VoidXH commented 1 year ago

@ValZapod For all content I could find, PTS = DTS = 0, timestamps started at 0. Checking the EBML, this is correct, the frames had no offset. What EBML entry should I check for time alignment, where else could this extra 3000 samples of silence come from?

VoidXH commented 1 year ago

This isn't just affecting that movie. While many are fine, Cocaine Bear for example has 512 samples of added delay instead of 3072. If there's no other way to offset a file, I might just have to close this issue and just work on the muxer.