Vanilagy / mp4-muxer

MP4 multiplexer in pure TypeScript with support for WebCodecs API, video & audio.
https://vanilagy.github.io/mp4-muxer/demo
MIT License
419 stars 32 forks source link

Delayed audio tracks #70

Closed frishu closed 3 weeks ago

frishu commented 3 weeks ago

Hi, hallo,

I'm currently experimenting with web codecs and have encountered this library through a blog post and I decided to give it a try to simplify the whole learning process. Everything related to the video works as I want it to but I'm having difficulties when it comes to audio.

I'd like to have delayed audio tracks but they're played after each other. The AudioData timestamp looks like it's ignored and I'm unsure if the issue is the person that just opened this issue or something related therefore I hope for some hints / solutions.

Short: Audio tracks aren't delayed although high timestamp difference.

Reproduction:


<!doctype html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>AudioData & AudioEncoder Example</title>
    <script src="build/mp4-muxer.js"></script>
  </head>
  <body>
    <button id="start">Start Encoding</button>

    <script>
      const { Muxer, ArrayBufferTarget } = window.Mp4Muxer;

      const muxer = new Muxer({
        target: new ArrayBufferTarget(),
        fastStart: 'in-memory',
        audio: {
          codec: 'opus',
          numberOfChannels: 2,
          sampleRate: 44100,
        },
      });

      // Initialize AudioContext
      const audioContext = new (window.AudioContext || window.webkitAudioContext)();

      // Load and decode audio data into an AudioBuffer
      async function loadAudioBuffer(url) {
        const response = await fetch(url);
        const arrayBuffer = await response.arrayBuffer();
        return await audioContext.decodeAudioData(arrayBuffer);
      }

      // Create an AudioData object from the AudioBuffer
      function createAudioData(audioBuffer, startTime) {
        const numberOfChannels = audioBuffer.numberOfChannels;
        const sampleRate = audioBuffer.sampleRate;
        const channelData = [];

        // Collect channel data
        for (let i = 0; i < numberOfChannels; i++) {
          channelData.push(audioBuffer.getChannelData(i));
        }

        // Flatten the multi-channel data into a single Float32Array
        const data = new Float32Array(channelData[0].length * numberOfChannels);
        for (let channel = 0; channel < numberOfChannels; channel++) {
          data.set(channelData[channel], channel * channelData[0].length);
        }

        // Create AudioData
        return new AudioData({
          format: 'f32-planar', // PCM format with 32-bit floating-point data
          sampleRate: sampleRate,
          numberOfFrames: audioBuffer.length,
          numberOfChannels: numberOfChannels,
          timestamp: startTime * 1e6, // Convert seconds to microseconds
          data: data,
        });
      }

      // Encode audio using AudioEncoder
      async function encodeAudio(audioData1, audioData2) {
        // Output callback for when encoded data is ready
        const outputCallback = (chunk) => {
          console.log('Encoded audio chunk:', chunk);
        };

        // Error callback
        const errorCallback = (error) => {
          console.error('AudioEncoder error:', error);
        };

        // Create and configure AudioEncoder
        const audioEncoder = new AudioEncoder({
          output: (chunk, meta) => muxer.addAudioChunk(chunk, meta),
          error: errorCallback,
        });

        // Configure the encoder (e.g., using the Opus codec)
        audioEncoder.configure({
          codec: 'opus', // You can also use 'aac' or others
          sampleRate: audioData1.sampleRate,
          numberOfChannels: audioData1.numberOfChannels,
        });

        // Encode the first audio data
        audioEncoder.encode(audioData1);

        // Encode the second audio data, starting at a different timestamp (e.g., 10 seconds later)
        audioEncoder.encode(audioData2);

        // Flush the encoder when done
        await audioEncoder.flush();

        console.log('Encoding completed.');

        muxer.finalize();

        const blob = new Blob([muxer.target.buffer], { type: 'video/mp4' });
        exportVideo(blob);
      }

      // Start the process
      async function startEncoding() {
        try {
          // Load the two audio files (assuming URLs are 'audio1.mp3' and 'audio2.mp3')
          const audioBuffer1 = await loadAudioBuffer('https://upload.wikimedia.org/wikipedia/commons/transcoded/8/87/Schlossbergbahn.webm/Schlossbergbahn.webm.1080p.vp9.webm');
          const audioBuffer2 = await loadAudioBuffer('https://upload.wikimedia.org/wikipedia/commons/transcoded/8/87/Schlossbergbahn.webm/Schlossbergbahn.webm.1080p.vp9.webm');

          // Create AudioData objects for each audio buffer with appropriate start times
          const audioData1 = createAudioData(audioBuffer1, 0); // Start at 0 seconds
          const audioData2 = createAudioData(audioBuffer2, 10); // Start at 10 seconds

          // Encode the audio data
          await encodeAudio(audioData1, audioData2);
        } catch (error) {
          console.error('Error during encoding process:', error);
        }
      }

      // Add event listener to start the encoding when button is clicked
      document.getElementById('start').addEventListener('click', () => {
        startEncoding();
      });

      function exportVideo(blob) {
        const vid = document.createElement('video');
        vid.controls = true;
        vid.src = URL.createObjectURL(blob);

        let extension = blob.type.split(';')[0].split('/')[1];

        a = document.createElement('a');
        a.id = 'video-download';
        a.download = new Date().getTime() + '.' + extension;
        a.textContent = 'download';
        a.href = vid.src;
        a.click();
      }
    </script>
  </body>
</html>

Thanks for support and the great library!

Vanilagy commented 3 weeks ago

Hallo!

No idea what you are talking about, your code works fine for me! The source video of the Schlossbergbahn is 12 seconds long, and you're encoding the audio twice, so the resulting file is 24 seconds long, which it is. And it's the same audio playing twice in a row.

The only mismatch is that the second audio starts playing after 12 seconds and not after 10 (which you specified in your code), but that is because you have already encoded 12 seconds of audio beforehand using the encoder. There's no way for the encoder to "go back in time", so the best it can do is to start the next audio at 12 seconds.

You said your intention was to have "delayed audio tracks" - by that do you mean that you want your two audio tracks to overlap? Like, play at the same time? IF this is the case, you'll have to get more advanced than just working with AudioData. Playing multiple things at the same time means you'll need to get into audio mixing. The Web Audio API is perfect for that with its audio context: Create an OfflineAudioContext, create one source buffer node for each input buffer, and then schedule the audio to play. So, the first Node would be scheduled with .start(0), the second with .start(10). Then, render the audio out into a final AudioBuffer, and then you can turn that AudioBuffer back into AudioData to pipe into the encoder, like you're already doing.

This may help:

// Mix two audio buffers into one using OfflineAudioContext
async function mixAudioBuffers(audioBuffer1, audioBuffer2, delayInSeconds) {
    const sampleRate = audioBuffer1.sampleRate;
    const numberOfChannels = audioBuffer1.numberOfChannels;

    // Determine the total length needed to fit both buffers
    const totalLength = Math.max(audioBuffer1.length, audioBuffer2.length + delayInSeconds * sampleRate);

    // Create an OfflineAudioContext to mix the buffers
    const offlineContext = new OfflineAudioContext(numberOfChannels, totalLength, sampleRate);

    // Create buffer source nodes for both audio buffers
    const source1 = offlineContext.createBufferSource();
    source1.buffer = audioBuffer1;
    source1.start(0); // Start immediately

    const source2 = offlineContext.createBufferSource();
    source2.buffer = audioBuffer2;
    source2.start(delayInSeconds); // Start after the delay

    // Connect the sources to the context's destination
    source1.connect(offlineContext.destination);
    source2.connect(offlineContext.destination);

    // Render the mixed output
    return await offlineContext.startRendering();
}
frishu commented 3 weeks ago

Ah, damn!

My bad, I didn't ensure that the video I took as example was longer than the 10 seconds. I actually didn't meant mixing, but this was actually the following step I wanted to figure out. Thank you already!

With the delay I meant that:

After the first audio passed (12s) I make a delay of a few seconds (let's say 2) where no audio is being played and then add another (or the same doesn't really matter) track. This would result in 26 seconds of audio where in between there's no sound.

Adjusting the code above it should look like this to make it more clear.


          const audioData1 = createAudioData(audioBuffer1, 0); // 0s-12s
          const audioData2 = createAudioData(audioBuffer2, 14); // 14s-26s
Vanilagy commented 3 weeks ago

Any conceivable scheduling of audio can be implemented using the OfflineAudioContext method I showed you above, including the one with the two seconds of silence. The silence needs to come from somewhere, and if you do this using the audio context then you'll have 2 seconds of silence after the first audio. This is better than having a "gap" in the audio chunks in the final encoded media, since that's just kinda... weird. And I wouldn't count on all media players dealing with that in the same way; perhaps some simply contract the silence. So, better to encode silence explicitly!

I'll close this issue as I think you have enough info to solve your problem. If there's anything more, feel free to ask!