Vanilagy / mp4-muxer

MP4 multiplexer in pure TypeScript with support for WebCodecs API, video & audio.
https://vanilagy.github.io/mp4-muxer/demo
MIT License
419 stars 32 forks source link

Possibility to mux aac audio without encoding it #9

Closed kareljuricka closed 1 year ago

kareljuricka commented 1 year ago

Hello,

first want to say that thanks for your webm and mp4 muxer libraries. There are awesome!

I have only one question. If I have audio - in aac codec - without any container like mp4 - want I want to mux this audio to mp4 container using aac codec. Do I really need AudioEncoder? Is there way to mux audio inside mp4 container without AudioEncoder?

If I try it this simple way, audio is playable with Chrome, but quicktime, vlc etc. can't play it - it immediately skips to end on start playing:

const muxer = new Muxer({
    target: new ArrayBufferTarget(),
    audio: {
        codec: 'aac',
        numberOfChannels: 2,
        sampleRate: 44100
    },
    firstTimestampBehavior: 'offset'
});

const arrayBuffer = await fetch('https://:DOMAIN:/assets/audio_testing/background_short.aac')
    .then((response) => response.arrayBuffer());

const audioContext = new AudioContext({sampleRate: 44100});

const audioBuffer = await audioContext.decodeAudioData(buffer.slice(0));

muxer.addAudioChunkRaw(new Uint8Array(arrayBuffer), 'key', 0, audioBuffer.duration * 1000000, {
    decoderConfig: {
        codec: 'aac',
        numberOfChannels: 2,
        sampleRate: 44100,
    }
});

Is there some proper way to do it without encoder or encoding is necessary for muxer in case that input is same aac source like mp4 container audio aac output.

Thank you for your anwser

0-mandrixx-0 commented 1 year ago

I think your aac file is chunks concatenated in adts format, which has minimal data like frame length so the file can be played. I was encoding audio chunks with adts before and muxer can't handle this.

decoding/encoding seems the only way to get the individual chunks from it

Vanilagy commented 1 year ago

All the data is already in the AAC file, so there's definitely no need to decode and reencode it. All you need to do is separate the single large buffer into its individual frames. You can find a definition of the header format here.

You'll need to repeatedly read out this header, then jump ahead by frame size to get to the next header, and so on. The bytes from the start of a header to the start of the next one make up one frame. You can then feed these into the muxer one by one using addAudioChunkRaw, and you should be good! The ADTS headers also contain some information about the sample rate and so on, which you can use as metadata input to the muxer.

Here's some GPT-4-written code for the frame extraction, maybe this will be handy. (Haven't tested it)

function splitAACFrames(buffer: ArrayBuffer): ArrayBuffer[] {
    const frames: ArrayBuffer[] = [];
    const view = new DataView(buffer);

    let i = 0;
    while (i < view.byteLength) {
        // check for syncword 0xFFF
        if (view.getUint16(i) != 0xFFF) {
            throw new Error(`Expected syncword at byte ${i}`);
        }

        // the length of the frame is in bits 30..43 of the header
        const frameLength = ((view.getUint16(i + 3) & 0x03) << 11)
            | (view.getUint8(i + 4) << 3)
            | ((view.getUint8(i + 5) & 0xE0) >> 5);

        // create a new ArrayBuffer for the frame
        const frameBuffer: ArrayBuffer = buffer.slice(i, i + frameLength);
        frames.push(frameBuffer);

        // move to the next frame
        i += frameLength;
    }

    return frames;
}
kareljuricka commented 1 year ago

@Vanilagy thank you for confirmation that decoding/encoding it not necessary. I tried algorithm from GPT-4, also made some more investigations and work and I got it almost to be done. Loop is going well, header bits are detected and addAudioChunkRaw is being called. But after muxer.finalize() mp4 is playable only from chrome. Not from QuickTime etc..

Interesting it's that timings seems to be broken to - durations is. I calculating duration from frame size and it is different for each frame - so its DYNAMIC. When I use FIXED duration calculated as 1024 / 44100 (Sample rate), complete duration is correctly 3s.

For example, here is link for input aac: https://www.juricka.com/data/input.aac

And here is 'corrupted' output mp4: https://www.juricka.com/data/mp4-muxer.mp4

Also want to say that when I use ffmpeg.wasm to mux (without decoding/encoding - with -c copy option), result mp4 is playable, here is its link: https://www.juricka.com/data/ffmpeg-wasm.mp4 (but it seems useless to me to download ffmpeg.wasm only for muxing).

Size is almost same but there is something wrong what I miss. Maybe can related to problem with duration?

const muxer = new Muxer({
    target: new ArrayBufferTarget(),
    audio: {
        codec: 'aac',
        numberOfChannels: 2,
        sampleRate: 44100
    },
    firstTimestampBehavior: 'offset'
});

const arrayBuffer = await fetch('https://:DOMAIN:/assets/audio_testing/input.aac')
    .then((response) => response.arrayBuffer());

let baseTime = 0;
// const duration = 1024 / 44100; // also tested with fixed duration
this.splitAACFrames(arrayBuffer).forEach((item, index) => {
    const duration = item.duration;
    muxer.addAudioChunkRaw(new Uint8Array(item.buffer), 'key', baseTime * 1000000, duration * 1000000);
    baseTime += duration;
});

function splitAACFrames(buffer: ArrayBuffer): { buffer: ArrayBuffer, duration: number }[] {
    const frames: { buffer: ArrayBuffer, duration: number }[] = [];
    const view = new DataView(buffer);
    let i = 0;
    while (i < view.byteLength) {
        // check for syncword 0xFFF
        if (view.getUint16(i) != 0xFFF1) {
            throw new Error(`Expected syncword at byte ${ i }`);
        }
        // the length of the frame is in bits 30..43 of the header
        const frameLength = ((view.getUint8(i + 3) & 0x03) << 11)
            | (view.getUint8(i + 4) << 3)
            | ((view.getUint8(i + 5) & 0xE0) >> 5);
        const sampleRate = getAACSampleRate(buffer);
        const duration = calculateFrameDuration(frameLength, sampleRate);
        // create a new ArrayBuffer for the frame
        const frameBuffer: ArrayBuffer = buffer.slice(i, i + frameLength);
        frames.push({
            buffer: frameBuffer,
            duration: duration,
        });
        // move to the next frame
        i += frameLength;
    }
    return frames;
}

function getAACSampleRate(arrayBuffer: ArrayBuffer) {
    // Extract the sample rate index from the ADTS header
    const dataView = new DataView(arrayBuffer);
    const sampleRateIndex = (dataView.getUint8(2) & 0x3C) >> 2;
    // Calculate the sample rate based on the index
    const sampleRateTable = [
        96000, 88200, 64000, 48000, 44100, 32000,
        24000, 22050, 16000, 12000, 11025, 8000, 7350
    ];
    const sampleRate = sampleRateTable[sampleRateIndex];
    return sampleRate;
}

function calculateFrameDuration(frameSize: number, sampleRate: number) {
    const bitsPerSample = 16; // Assuming 16-bit samples
    const numChannels = 2; // Assuming stereo audio
    const numSamples = (frameSize * 8) / (bitsPerSample * numChannels);
    const duration = numSamples / sampleRate;
    return duration;
}

EDIT: I found out that duration is 10 times smaller when calculation dynamically. But when I fixed it with changing 1000 000 to 10 000 000, audio is still not working.

muxer.addAudioChunkRaw(new Uint8Array(item.buffer), 'key', baseTime * 10000000, duration * 10000000);
kareljuricka commented 1 year ago

I found a solution: I have to feed addAudioChunkRaw only with data without header (first 7 / 9 bytes). Then it seems to work!

Vanilagy commented 1 year ago

Awesome! So, does it fully work now, i.e. playing in all players? And did you need to keep this duration change?

kareljuricka commented 1 year ago

Yes, now i fully works. I have two problems there but I solved them

  1. Wrong calculation of indexes for buffer.slice to selecting only data
  2. I found out there is that using fixed duration calculation is correct way, so for me it's 1024 / 44100

Thank you so much for your help! Really appreciate it!

avhasib commented 1 year ago

@kareljuricka Can you please share the working code or fix the code in the above thread.

weepy commented 5 months ago

yes please @kareljuricka !

kareljuricka commented 5 months ago

@weepy here is fixed version of splitACCFrames:

  1. Wrong calculation of indexes for buffer.slice to selecting only data:
    private splitAACFrames(buffer: ArrayBuffer): ArrayBuffer[] {
    const frames: ArrayBuffer[] = [];
    const view = new DataView(buffer);
    let i = 0;
    while (i < view.byteLength) {
        // check for syncword 0xFFF
        if (view.getUint16(i) != 0xFFF1) {
            throw new Error(`Expected syncword at byte ${ i }`);
        }
        // the length of the frame is in bits 30..43 of the header
        const frameLength = ((view.getUint8(i + 3) & 0x03) << 11)
            | (view.getUint8(i + 4) << 3)
            | ((view.getUint8(i + 5) & 0xE0) >> 5);
        // create a new ArrayBuffer for the frame
        const frameBuffer: ArrayBuffer = buffer.slice(i + 7, i + frameLength); // FIX
        frames.push(frameBuffer);
        // move to the next frame
        i += frameLength;
    }
    return frames;
    }
  2. Fixed duration calculation, so for me it's 1024 / 44100:
    let baseTime = 0;
    const duration = 1024 / 44100; // Fixed calculation
    splitAACFrames(arrayBuffer).forEach((buffer, index) => { // arrayBuffer contains audio
    muxer.addAudioChunkRaw(new Uint8Array(buffer), index % 10 === 0 ? 'key' : 'delta', baseTime * 1000000, duration);
    baseTime += duration;
    });
weepy commented 5 months ago

amazing thankyou !