Vanilagy / mp4-muxer

MP4 multiplexer in pure TypeScript with support for WebCodecs API, video & audio.
https://vanilagy.github.io/mp4-muxer/demo
MIT License
419 stars 32 forks source link

Read preskip and gain from codec private #62

Closed szymonrw closed 1 month ago

szymonrw commented 1 month ago

Hi! :wave:

The write up ended up long-ish so just wanted to let you know that this isn't high priority but rather just a potential improvement. No rush necessary :+1:

I've been looking into why some MP4 + Opus files were being slightly truncated at the beginning, especially interesting was that webm-muxer seem to be produce different results after decoding.

Here's a small demo to demonstrate the difference: https://codepen.io/brainshave/pen/QWXxKmW. It encodes a sine wave as Opus and feeds it to both muxers, then plots the soundwaves for outputs of both muxers (red for MP4 and blue for WebM) + the original sound wave (black).

The main difference that can be observed that MP4 (the red line) is out of sync with the rest and ends earlier.

scrot-2024-08-27-19-19_02_51

Few notes about the demo:

  1. works only in Chrome (only browser to implement AudioEncoder for now IIRC)
  2. chose a very low frequency (5Hz) for the sound wave so that it shows nicely in the plot but it's not audible of course; it can be changed by modifying the HZ constant at the top
  3. the difference in sample count doesn't change even if HZ or SECONDS are modified
  4. the preskip value in the OpusHead seems to be 312 instead of 3840 in this particular case (but of course could be anything that the encoder decides)
  5. plot for WebM isn't perfect either but it could be just how the AudioEncoder encodes the data. This might be another interesting investigation but I think it'd be better to leave that for some other time. I feel like the goal for now would be to just make both muxers produce the same outputs when decoded (Opus encoder quirks notwithstanding).

My understanding of the problem:

IIUC, from the code it looks like MP4 (unlike WebM) doesn't use OpusHead description from the encoder directly but has its own dOps format that differs in byte layout. Does that sound correct?

So far it seems that preskip was hard-coded to 3840. If there's a significant difference in the value in the OpusHead description from the encoder, this can lead to more than necessary data being cut off from the beginning and possibly a slight desync with video. (Although I'm for now focusing on audio-only case.)

The fix seems to be reading the OpusHead description and translating it when creating the dOps box. Apart from preskip, made it read gain value too but figured maybe to be conservative about the rest for now. Also, made it fall back to the original hard-coded values if the description field is missing to avoid regressions with existing code.

Thank for your time reading this. Please have a look at the code whenever you feel like it and LMK whether it's OK, fits the coding standards, etc. :bow:

Vanilagy commented 1 month ago

Would you mind fixing the merge conflicts? It's just the minified builds, so the fix is trivial. For the future, you can enable this!

szymonrw commented 1 month ago

Would you mind fixing the merge conflicts? It's just the minified builds, so the fix is trivial.

Done and tested locally :+1:

For the future, you can enable this!

It seems like this option is not available for me. Could be company policy but dunno for sure. Anyway, always happy to fix merge conflicts when needed :+1: