Decoding aac file results in frames where timestamp are unset

kartikdutt18 commented 6 months ago

On calling encode frames, the below code throws an error: Runtime division by 0 in av_interleaved_frames. On debugging the issue, the pts values for packets and frame is 0. File used: https://filesamples.com/samples/audio/aac/sample3.aac (Using mp4 files work but not aac)

The pts values are 0 after we get them from ff_decode_multi, even if set the pts manually as done in flattenFrames it doesn't work.

Hi @Yahweasel, can you please help me in understanding the issue here, thank you!

Sharing code below for reference:

import { IAudioEncoderData } from './interfaces';

function makeCodecOpts(_frame: Frame | undefined, _libav: LibAV): { ctx: AVCodecContextProps; time_base: number[] } {
  return {
    ctx: {
      bit_rate: 66000,
      sample_fmt: 8,
      sample_rate: 48000,
      channel_layout: 4,
      channels: 1,
    },
    time_base: [1, 48000],
  };
}

export class AudioWriter {
  private static fileNum = 0;
  private libav: LibAV = {} as LibAV;
  private audioEncoderData: IAudioEncoderData[] = [];
  public destinationContext = -1;
  private encoderAVCodecContext = -1;
  private encoderFramePointer = -1;
  private encoderPacketHandler = -1;
  private streamPacketHandler = -1;
  private streamIdx = 0;

  constructor(libAV: LibAV) {
    this.libav = libAV;
  }

  async open(audioData: {audioFile: string, startTime: number}[], oc: number) {
    this.destinationContext = oc;
    if (audioData.length === 0) {
      return;
    }

    const readAudioFilesPromise: Promise<IAudioEncoderData>[] = audioData.map((audio) => {
      return this.openAudioFile(audio, ++AudioWriter.fileNum);
    });

    this.audioEncoderData = await Promise.all(readAudioFilesPromise);
    if (this.audioEncoderData.length === 0) {
      // It should not be empty since audioData.length !== 0.
      throw new Error('AudioEncoderData is empty.');
    }

    const encoder = await this.libav.ff_init_encoder(
      'aac',
      makeCodecOpts(
        this.audioEncoderData[0].frames.length > 0 ? this.audioEncoderData[0].frames[0] : undefined,
        this.libav,
      ),
    );
    this.encoderAVCodecContext = encoder[1];
    this.encoderFramePointer = encoder[2];
    this.encoderPacketHandler = encoder[3];

    const retL = await (this.libav as any).ff_add_stream(this.destinationContext, this.encoderAVCodecContext, 1, 48000);
    this.streamPacketHandler = retL[1];
    this.streamIdx = retL[0];
  }

  async encodeFrames() {
    if (!this.audioEncoderData || this.audioEncoderData.length === 0 || !this.destinationContext) {
      return;
    }

    const frames = this.flattenFrames(this.audioEncoderData);
    const packets = await this.encodeAudioFrames(frames, true);

    await this.libav.ff_write_multi(this.destinationContext, this.streamPacketHandler, packets, true);
    await this.libav.ff_write_multi(this.destinationContext, this.streamPacketHandler, [], true);
    await this.libav.ff_free_encoder(this.encoderAVCodecContext, this.encoderFramePointer, this.encoderPacketHandler);
  }

  async close() {
    if (!this.audioEncoderData) {
      return;
    }

    const cleanupPromises = this.audioEncoderData.map(async (audioData) => {
      await this.libav.ff_free_decoder(audioData.c, audioData.pkt, audioData.frame);
      await this.libav.avformat_close_input_js(audioData.formatCtx);
    });

    await Promise.all(cleanupPromises);
  }

  // Open one audio file and get frames.
  private async openAudioFile(audioData: IExportAudioData, idx: number): Promise<IAudioEncoderData> {
    const response = await fetch(audioData.audioFile);
    const data = new Uint8Array(await response.arrayBuffer());
    const filename = `AudioPlayer${idx}.aac`;
    await this.libav.writeFile(filename, new Uint8Array(data));

    const readDemuxer = await this.libav.ff_init_demuxer_file(filename);
    const formatCtx = readDemuxer[0];
    let audioStream: Stream | undefined;
    for (const s of readDemuxer[1] /* readDemuxer = [cntx, streams] */) {
      if (s.codec_type === this.libav.AVMEDIA_TYPE_AUDIO) {
        audioStream = s;
        break;
      }
    }

    if (audioStream === undefined) {
      // No audio to add.
      throw new Error('No audio stream');
    }

    const audio_stream_idx = audioStream.index;

    const rDecoder = await this.libav.ff_init_decoder(audioStream.codec_id, audioStream.codecpar);
    const c = rDecoder[1];
    const pkt = rDecoder[2];
    const frame = rDecoder[3];
    const packets = await this.readFile(audio_stream_idx, formatCtx, pkt);
    const frames = await (this.libav as any).ff_decode_multi(c, pkt, frame, packets, true);
    return {
      frames,
      c,
      pkt,
      frame,
      formatCtx,
      startTime: audioData.startTime,
    };
  }

  private async readFile(audio_stream_idx: number, formatCtx: number, pkt: number) {
    const packets = [];
    // return (await this.libav.ff_read_multi(formatCtx, pkt))[1][audio_stream_idx];
    for (;;) {
      const ret: any = await this.libav.ff_read_multi(formatCtx, pkt, undefined, { limit: 100 });
      if (ret[1][audio_stream_idx] !== undefined) {
        for (const p of ret[1][audio_stream_idx]) {
          packets.push(p);
        }
      }

      if (ret[0] === this.libav.AVERROR_EOF) {
        break;
      }
    }
    return packets;
  }

  private async encodeAudioFrames(frames: Frame[], isLastFrame: boolean) {
    const packets = await this.libav.ff_encode_multi(
      this.encoderAVCodecContext,
      this.encoderFramePointer,
      this.encoderPacketHandler,
      frames,
      isLastFrame,
    );

    for (const p of packets) {
      p.stream_index = this.streamIdx;
    }
    return packets;
  }

  private flattenFrames(frameData: { startTime: number; frames: Frame[] }[]) {
    frameData = frameData.sort((a, b) => a.startTime - b.startTime);
    const frames = [...frameData[0].frames];
    const addNum = (a: number | undefined, b: number | undefined) => {
      return (a ?? 0) + (b ?? 512);
    };
    for (let i = 1; i < frameData.length; i++) {
      const prevFrameData = frameData[i - 1];
      const prevFrame = prevFrameData.frames[prevFrameData.frames.length - 1];
      for (const f of frameData[i].frames) {
        f.pts = addNum(f.pts, prevFrame?.pts);
        f.ptshi = addNum(f.ptshi, prevFrame?.ptshi);
        frames.push(f);
      }
    }

    return frames;
  }
}

Yahweasel commented 6 months ago

It's generally harmless for pts to be 0. Division by zero in handling frames or packets usually has to do with the timebase, not the timestamps, but the way you're handling the timebase appears correct... my suspicion would be that the time_base isn't set properly by your mysterious ff_add_stream function, or that it's not set properly by the input decoder, and somehow that lack of timebase is flowing through. Basically: take a look at some of the timebases and see if those are suspicious first.

kartikdutt18 commented 6 months ago

Ah, I'll try to take a look thank you so much! For reference, this is my add_stream_method:


  libav.ff_add_stream = function (
    oc: number,
    // eslint-disable-next-line @typescript-eslint/ban-types
    codecparms: number | object,
    time_base_num: number,
    time_base_den: number,
  ) {
    const st = libav.avformat_new_stream(oc, 0);
    if (st === 0) throw new Error('Could not allocate stream');
    const codecpar = libav.AVStream_codecpar(st);
    let ret = 0;
    if (typeof codecparms === 'number') {
      ret = libav.avcodec_parameters_from_context(codecpar, codecparms);
    } else {
      ret = libav.ff_set_codecpar(codecpar, codecparms);
    }
    if (ret < 0) throw new Error('Could not copy the stream parameters: ' + libav.ff_error(ret));
    libav.AVStream_time_base_s(st, time_base_num, time_base_den);
    const pkt = libav.av_packet_alloc();
    if (pkt === 0) throw new Error('Could not allocate packet');
    const sti = libav.AVFormatContext_nb_streams(oc) - 1;
    return [sti, pkt];
  };```

  I'm also using this to add another stream created by h264 codec containing the video from another source so resultant output is mp4 with audio and video from different sources.

kartikdutt18 commented 6 months ago

Hi @Yahweasel, I tried logging the time base at add_stream_method for the st stream created and the time_base_num and time_base_den are correct. Also, after demuxing, the stream had a defined timebase: 1 / 28224000. Could you please take another look at this please, I've hit a roadblock and not able to figure out a way ahead

Yahweasel commented 6 months ago

That timebase is clearly madness. You cannot have a timebase of 1/28224000. An audio file would typically have a timebase of 1/samplerate.

I cannot reproduce any issue. This file transcodes fine with both tools/ffmpeg.js and an adapted version of 611-transcode-video.js. I'm going to assume that you're missing some error, no aac demuxer configured or similar.

kartikdutt18 commented 6 months ago

Hi @Yahweasel These are the flags I'm using and I have tried with multiple audio files the timebase is usually that high, is there something missing from this, can you please help me or point me in some direction I should decode:

--enable-protocol=data --enable-protocol=file --enable-filter=aresample --enable-decoder=aac --enable-encoder=aac --enable-libopenh264 --enable-muxer=mp4 --enable-parser=aac --enable-demuxer=mp4 --enable-demuxer=mov --enable-muxer=adts --enable-demuxer=aac --enable-decoder=libopenh264 --enable-encoder=libopenh264 --enable-filter=acompressor --enable-filter=adeclick --enable-filter=adeclip --enable-filter=aecho --enable-filter=afade --enable-filter=aformat --enable-filter=agate --enable-filter=alimiter --enable-filter=amix --enable-filter=apad --enable-filter=atempo --enable-filter=atrim --enable-filter=bandpass --enable-filter=bandreject --enable-filter=dynaudnorm --enable-filter=equalizer --enable-filter=loudnorm --enable-filter=pan --enable-filter=amix --enable-filter=volume

Yahweasel / libav.js

Decoding aac file results in frames where timestamp are unset #45